AlgART arrays and matrices: generalized arrays and matrices of any Java types
and basic algorithms of their processing.
AlgART arrays are classes allowing to store one- or multi-dimensional random access arrays,
containing elements of any Java type, including primitive types.
AlgART arrays are homogeneous: the type of elements
of an array are the same (for primitive elements) or
are inheritors of the same class (for non-primitive elements).
AlgART arrays, unlike standard Java arrays, can be resizable:
you can add elements to the array end or remove some elements at any time.
The basic AlgART array interfaces are Array,
UpdatableArray, MutableArray:
one-dimensional arrays with any element type.
There are many subinterfaces with additional functionality and restrictions.
The basic interface fo representing AlgART multi-dimensional matrix is Matrix.
The classes Arrays and Matrices
offer a wide collection of useful functions for processing AlgART arrays and matrices.
The addressing of array elements is 63-bit. So, it's theoretically possible to
create and process arrays containing up to 263-1 (~1019)
elements of any primitive or non-primitive types, if OS and hardware can provide necessary amount
of memory or disk space. Please see also the section
"The maximal supported array length" below.
Multi-dimensional arrays, named matrices, are supported via the
Matrix interface. Any one-dimensional
array can be viewed as a matrix and vice versa.
AlgART arrays are implemented with help of special factories, named
Virtual Memory Models (MemoryModel interface),
that provide a standard way of implementing any schemes for storing data,
from simple Java arrays to mapped disk files.
The current implementation offers 4 basic memory models:
the simplest and fastest Simple memory model,
that stores data in usual Java arrays;
Buffer memory model,
that use Java NIO buffers for storing data:
it does not provide the maximal access speed, but can provide better overall performance
in applications that require large amount of RAM, and also provides good compatibility
with Java code working with channels and NIO buffers;
advanced Large memory model,
that can store large amount of primitive elements in mapped-files —
it is the only way (in current Java versions) to create very large arrays,
containing, theoretically, up to 263-1 elements;
special Combined memory model,
allowing efficient storing non-primitive elements in a set of another arrays
— together with LargeMemoryModel, it allows to store
more than 231-1 non-primitive elements in one array.
Moreover, the Large memory model
is based on special low-level factories, named Data File Models
(DataFileModel interface).
Creating non-standard data file models allows to easily implement storing
array elements in almost any possible devices or storages.
For example, it's possible to create data file model that will represent a content of
BufferedImage
as AlgART array.
Arrays implement maximal efficient memory usage. In particular, AlgART arrays allow to
use only 1 bit per element while storing boolean values, 4 bytes per elements
while storing int or float values, or 3*4=12 bytes per element while
storing Java objects consisting 3 int field, as for the following Circle object:
class Circle {
int x;
int y;
int r;
}
Unlike this, Java NIO buffers allow efficient storing only non-boolean primitive elements,
and standard ArrayList always stores every element as a separate instance,
that require a lot of extra memory for simple structures.
There are separate interfaces for almost all kinds of data access
that makes usage of arrays more simple and stable.
Namely, there are separate interfaces
for read-only access (Array),
read-write access without changing the array length
(UpdatableArray),
stack access — adding and removing the last element
(Stack),
and full access including resizing the array
(MutableArray).
In addition, there are DataBuffer interface,
providing convenient and maximally efficient block access to AlgART arrays,
and DirectAccessible interface
for quick access to internal Java array if the elements are
really stored in an underlying Java array.
There is also full set of interfaces for quick and convenient access to
elements of all primitive types.
This architectural solution allows safe programming style,
when illegal array operations are syntactically impossible.
For example, the methods, which process an AlgART array argument and calculate some results,
but don't need any modifications of the passed array,
declare their array argument as Array or XxxArray,
where Xxx is Byte, Int, Float, ...
— interfaces containing only reading, but not writing methods.
The methods, which correct the array content, but don't need to add or remove array elements,
declare their array argument as UpdatableArray (or UpdatableXxxArray)
— interfaces containing only reading and writing, but not resizing methods.
This solution allows to avoid "optional" operations,
used for implementation of read-only arrays in standard Java libraries.
The basic set of AlgART array interfaces and classes can be represented as 3-dimensional structure.
The 1st dimension corresponds to the type of elements: there are separate interfaces and classes
for all 8 primitive Java types and for Object type (and its inheritors).
The 2nd dimension describes the array functionality:
read-only access (Array),
read/write access (UpdatableArray),
full access including resizing (MutableArray),
quick access to internal Java array (DirectAccessible).
The 3rd dimension is the Virtual Memory Model:
the scheme of storing array elements.
Below is a diagram of basic array interfaces and classes.
There are special superinterfaces for some groups of primitive types,
allowing to specify any array with elements from such groups.
The hierarchy is the following:
Also, all subinterfaces of PFixedArray and
PFloatingArray are grouped into the common interface
PNumberArray (any primitive types excepting boolean
and char, alike java.lang.Number class).
There are the same hierarchies for updatable and mutable arrays, but not for stacks.
The maximal supported array length
The maximal possible length of AlgART array depends on the
memory model, which has created this array.
An attempt to create an array with length exceeding the limit, specified by the memory model,
or an attempt to increase the length
or capacity
of an existing array over this limit, leads to
TooLargeArrayException (instead of the usual OutOfMemoryError).
The maximal array lengths for different memory models are listed below.
The maximal array length is defined by the language limitations for arrays.
So, it cannot exceed 231-1 — the maximal possible length of Java arrays,
excepting bit arrays, that can contain up to 237-1
because they are packed into long[].
However, real Java Machines usually limit the maximal length of arrays by 231-1bytes (though the language theoretically allows to defined an array with 231-1elements). It reduces the maximal possible length of AlgART arrays.
The maximal array length is defined by the Java API limitations for ByteBuffer class.
This API use int type for the buffer length
and allows creating direct NIO buffers only as views of ByteBuffer.
So, the limit is 231-1bytes.
The maximal array length is limited by 263-1bytes (the maximal
supported file length in Java API and most OS), but also, of course,
cannot exceed the common limit 263-1elements
(that is more strict limitation for bit arrays).
The maximal array length depends on the corresponding limit for a memory model,
that is used by the combiner
which defines an algorithm of storing objects.
For example, if the storage is based on Large memory model,
the maximal array length usually depends only on the available disk space.
5 levels of protection against unallowed changes
There is an often and important problem to protect some application data
against unallowed changes, to avoid hard bugs connected with unexpected damage of application
objects. Below is a typical code illustraging this problem:
DataClass a = ...; // some important data
someResults = someObject.process(a); // some method that need read-only access to the argument
Here a is some "important" data, that must stay immutable while the following call
of process method. Maybe, for example, some parallel threads are reading this object now.
And someObject.process is a method, which, theoretically, should not correct the passed data:
it only analyzes it and creates some result.
But it's very possible, that we are not sure, that this method really fulfils it.
Maybe, its implementation is downloaded from Internet (and can be written by a hacker
to damage our application). Or this method performs very complex tasks, and we cannot be sure
that it doesn't contain bugs.
Java arrays offer only one way to protect data against changes and to solve this issue:
cloning an array. An example:
int[] a = ...; // some important array
int[] protected = (int[])a.clone();
someResults = someObject.process(protected);
Here the process method can corrupt the passed argument, but the original
array will stay unchanged.
Standard Java collections, as well as buffers from java.nio package, offer
an additional method: making an immutable view. For example:
List a = ...; // some important list
List protected = Collections.unmodifiableList(a);
someResults = someObject.process(protected);
Now, if the process method will try to correct the passed argument, an exception
will be thrown. This solution has an advantage: no additional memory is required for storing
a clone.
AlgART array architecture supports 5 ways for solving this task, including 2 ways
described above. We shall compare all them below.
1. Syntactical protection
The 1st solution is the simplest, fastest, but not safe enough. It is a syntactical solution.
If the process method does not need to modify its argument, it should be declared with
Array argument type, which doesn't contain any mutation methods at all:
public ResultType process(Array a);
A usage example:
Array a = ...; // maybe, there is MutableArray in this expression
someResults = someObject.process(a);
Of course, it is not a problem to write a "malicious" process method which will
correct its argument by operators alike the following: ((MutableArray)a).set(0, ...).
However, if process method is written by you or by your colleagues,
and you only need to protect against possible bugs, the syntactical protection will help you.
The difference from the analogous technique, implemented in standard Java libraries,
is absence of "optional" operations. In a case of Java collections, process
method will have an ability to call a mutation method
for the passed array, but this method will throw an exception.
Unlike this, Array.asImmutable() method returns
an object that does not implement any interfaces and does not contain
any methods, which allow to change data anyhow.
The main defect of this solution is disabling any possible optimization, based on direct access
to stored data. For example, DirectAccessible interface can allow access to
the Java array, internally used for storing elements. If process method needs a lot
of accesses to elements in random order, then using DirectAccessible interface
in a separate algorithm branch can optimize the method in times, in a case when the AlgART array is really
based on Java arrays. Unfortunately, an immutable view has no right to provide such direct access
to internal storage, because it is a possible way to corrupt the array.
4. Trusted immutable view
It is a compromise between absolute safety, provided by cloning and immutable views,
and maximal efficiency, achieved while using syntactical protection only. An example of usage:
Unlike usual immutable view, a trusted immutable view may implement some interfaces,
that allow to change the array content — only if it is really necessary for
optimization. (The only example of such interface in this package is DirectAccessible.)
So, the process method can corrupt the original array a.
However, any changes in the original array will be detected by the following call
"protected.checkUnallowedMutation()" with almost 100% probability,
and if the array was changed, UnallowedMutationError will be thrown.
To detect changes, checkUnallowedMutation usually calculates a hash code
and compares it with the hash code calculated in asTrustedImmutable method.
This technique is a suitable solution if you trust the authors of process method
(for example, it is written by you or your colleagues), but this method is not trivial
and you are not sure that all possible bugs in this method are already fixed.
Unlike all other 4 protection methods, it is the only way to automatically detect such bugs:
so, trusted immutable views can be very useful in a stage of testing application.
This solution have the following defects.
It requires little additional time (for calculation of hash code) while calling
asTrustedImmutable() and
checkUnallowedMutation() methods.
(However, as well as immutable views, this method does not require any additional memory.)
If there are another threads working with the original array,
then the unallowed changes of elements can lead to errors in that threads before
detecting these changes and throwing UnallowedMutationError.
This method protects against algorithmic bugs, but not against a malicious method,
specially written by a hacker. If the author of the method have read the source code of this package,
it can perform changes of the array, which will not be detected by this technique.
5. Copy-on-next-write view
It is a more efficient alternative to cloning an array. This solution is also absolutely
safe, but, sometimes, it requires additional memory and time. An example:
Now process method may freely change the passed argument (if it implements necessary
interfaces). However, the first (not further!) attempt to modify the passed protected array,
or any other access that can lead to its modification (like DirectAccessible.javaArray()
method), will lead to reallocation of the underlying storage, used for array elements,
before performing the operation. It means that modification will not affect
the original a array, but only protected array.
This solution is the best choice, if you need a strict guarantee that the original array
will not be modified (that is not provided by trusted immutable views), and you don't need
a guarantee that no additional memory will be occupied. If the process method
will not try to modify the array or use optimization interfaces alike DirectAccessible,
then this solution will provide the maximal efficiency.
If the method will try to get direct access for optimization via DirectAccessible
interface, then the internal data will be cloned at this moment, that can require additional memory and time,
but all further accesses to elements will work with maximal efficiency.
This solution have the following defects.
It is not better than simple cloning, if the processing method always use the direct
access to the storage for optimization goals (for example, via DirectAccessible
interface), and the passed array really allows this optimization (for example, is based on Java arrays).
Good processing methods can use Array.isCopyOnNextWrite() method
to choose the best behavior.
It does not help to detect bugs that lead to unallowed changes of array elements. But you
can use this technique together with trusted immutable view:
"Array protected = a.asCopyOnNextWrite().asTrustedImmutable()" Such protection is also absolutely safe,
but also allows to catch unallowed attempts of correction by
checkUnallowedMutation() method.
Multithreading and synchronization
Immutable AlgART arrays are absolutely thread-safe
and can be used simultaneously in several threads.
Moreover, even if an AlgART array is mutable (for example, implements MutableArray),
but all threads, accessing it, only read data from it and do not attempt to modify the array by any way,
then this array is also thread-safe and no synchronization is required.
(The same rules are correct for usual Java arrays.)
If there are 2 or more threads accessing an AlgART array, and at least one from them modifies it
(for example, changes elements or the length), then you should synchronize access to the array.
Without external synchronization, the resulting data in the array will be unspecified.
However, if you do not use any methods from MutableArray /
Stack interfaces and their inheritors,
but only read and write elements via methods provided by UpdatableArray interface
(and its versions for concrete element types),
then the behavior while simultaneous multithreading access will be the same as for usual Java arrays.
In particular, access to one element will never affect another elements.
So, you can correctly simultaneously work with several non-overlapping sets of elements of the same array
from several threads without synchronization, if different threads work with different sets.
(Please compare: the standard java.util.BitSet class does not provide such guarantees.)
Runtime exceptions and errors
The methods of classes implementing AlgART arrays, usually, can throw exceptions declared in
the Javadoc comments to methods. In addition, there are following exception, that can be thrown by methods
and are not always specified in comments.
java.io.IOError can be thrown at any moment by any method processing an AlgART array,
as well as OutOfMemoryError can be thrown my almost any Java method.
java.io.IOError is usually thrown when an array is based on some external file,
as in the Large memory model,
and there is some problem with access to this file, for example, not enough disk space.
One of the typical situations leading to this error is unexpected program termination,
when some threads work with arrays created by Large memory model.
In this case, the built-in shutdown hook is started to remove all temporarily allocated disk files,
and since it is started, almost any access to an AlgART array, based on a temporary file,
lead to IOError with IllegalStateException as a cause.
Warning: java.io.IOError can be also thrown while processing an AlgART array,
based on some external file, as in the Large memory model,
if the current thread is interrupted by Thread.interrupt() method.
Also this error is thrown if the Thread.interrupt() method is called for a thread,
that is currently performing multithread copying by copy method.
Usually, such behavior is not suitable. So, you should not try to interrupt the
threads, processing AlgART arrays, via Thread.interrupt() technique!
Please use an alternative technique: some volatile flag, required interruption,
or net.algart.contexts.InterruptionContext interface.
For interruption of Arrays.copy method,
you can also use the custom implementation of ArrayContext.
TooLargeArrayException can be thrown by methods,
which allocate memory in form of AlgART arrays, if the size of an allocated array is too large:
see comments to this exception. In other words,
this exception is possible in the same situations as the standard OutOfMemoryError,
but its probability is very low: usually an attempt to create too large AlgART array leads to
another exceptions, like OutOfMemoryError or java.io.IOError with "disk full" message.
Any RuntimeException can be thrown by methods, using
ArrayContext technique to allow interruption by user
(for example, by Arrays.copy method),
if the context, passed to this method, throws this exception in its
checkInterruption() method.
It is the recommended way of interrupting long-working methods, processing AlgART arrays.
java.io.IOError was added since JDK 1.6 only. Under Java 1.5, the similar
net.algart.arrays.IOErrorJ5 exception is thrown instead.
This error is package-private and may be excluded in future versions of AlgART libraries.
AssertionError can be thrown at any time if some bug will be auto-detected in AlgART libraries.
Some internal checks (that can lead to this error in a case of the bug) are skipped
when Java is started without -ea flag.
The most serious bugs, if they will be auto-detected, lead to InternalError
instead of AssertionError.
We hope that these errors will not be thrown inside this package in your applications.
System properties used for customizing AlgART arrays
Behavior of AlgART arrays depends on some system properties, that allow to customize many important aspects.
Below is the list of these properties.
"net.algart.arrays.maxAvailableProcessors"
Defines the maximal number of processor units, that are permitted to be used simultaneously
by AlgART libraries for any goals. Namely, AlgART libraries never directly use
the system value, returned by Runtime.getRuntime().availableProcessors() call,
but use a minimum from that value and the value, specified in this property.
The default value depends on JVM: on 64-bit JVM it is 256, on 32-bit it is only 8.
If it is not suitable, please specify another value (from 1 to 1024).
See Arrays.SystemSettings.availableProcessors()
for more details.
Defines the maximal amount of usual Java memory, in bytes, which can be freely used
by methods, processing AlgART arrays, for internal needs and for creating results.
May contain any non-negative long value.
Default value is 33554432 (32 MB).
See Arrays.SystemSettings.maxTempJavaMemory() for more details.
Defines the maximal size of memory block, in bytes, that should be processed in several threads
for optimization on multiprocessor or multi-core computers.
May contain any positive long value.
Default value is 1048576) (1 MB).
See Arrays.SystemSettings.maxMultithreadingMemory() for more details.
"net.algart.arrays.maxMappedMemory"
Defines the maximal amount of system memory, in bytes, allowed for simultaneous mapping
by DefaultDataFileModel class.
May contain any non-negative long value.
Default value is 536870912 (512 MB).
See DefaultDataFileModel.maxMappedMemory() for more details.
"net.algart.arrays.globalThreadPoolSize"
Defines the number of threads in the global system thread pool that will be used for multithreading optimization.
If zero or negative, then the thread pools will be created on demand.
If not exists, the global thread pool with
Arrays.SystemSettings.availableProcessors()*MULT+1
threads (default value) will be used, where MULT is an integer value of
"net.algart.arrays.globalThreadPoolsPerCPU"
system property. See DefaultThreadPoolFactory.globalThreadPool()
for more details.
"net.algart.arrays.globalThreadPoolsPerCPU"
Helps to define the number of threads in the global system thread pool
if "net.algart.arrays.globalThreadPoolSize" system property does not exist: see above.
Defines whether the algorithms, processing AlgART arrays, should write to logs some timing information.
May be "false" or "true". Default value is identical to "-ea" JVM flag:
if java was called with "-ea" flag (assertions are enabled), the default profiling mode is true,
in other case it is false.
See Arrays.SystemSettings.profilingMode() for more details.
Defines whether AlgART mapping manager in the
default data file model
should increase the file size via standard I/O API, or it is increased automatically
as a result of new mappings.
Default value is false.
See DefaultDataFileModel.autoResizingOnMapping() for more details.
All these properties, excepting "net.algart.arrays.CPUCount",
are loaded while initialization of the corresponding classes.
So, any changes of them will be applied only at the next start of the Java application.
Note: all properties containing integer values, excepting
"net.algart.arrays.CPUCount",
can contain a suffix K, M, G, T
(or k, m, g, t),
that means that the integer value, specified before this suffix, is multiplied by
1024 (210, "Kilo"), 1048576 (220, "Mega"), 1073741824 (230, "Giga")
or 1099511627776 (240, "Tera") correspondingly.
For example, you can specify -Dnet.algart.arrays.DefaultDataFileModel.bankSize=64m
to set the bank size to 64 MB.
There are some other system properties, starting with "net.algart.arrays." substring,
used for internal goals. They are undocumented and should not be used in your applications.
Built-in logging
Some classes implementing AlgART arrays logs some situations via standard java.util.logging tools.
Now 3 loggers are used:
While executing Arrays.copy(ArrayContext, UpdatableArray, Array)
and Arrays.copy(ArrayContext, UpdatableArray, Array, int) methods,
if Arrays.SystemSettings.profilingMode() returns true
and the execution time is long enough.
In this case, these methods log the time of copying and a short description of the source array
(generated by its toString() method).
Please note that these methods underlie in a lot of array processing algorithms,
that create some "lazy" array view and then actualize it via copying into a new array.
So, these 2 methods are often the main methods that should be profiled.
When LargeMemoryModel is initialized and the system property
net.algart.arrays.LargeMemoryModel.dataFileModel contains illegal name of data memory model.
See the method LargeMemoryModel.getInstance().
In finalization code and built-in shutdown hook, if some error occurs while releasing
mappings,
that usually means flushing all non-saved data to disk.
In finalization code and built-in shutdown hook, if an attempt to delete a data file
via DataFileModel.delete method leads to exception
other than java.io.IOError (or net.algart.arrays.IOErrorJ5 under JDK 1.5).
Usually it means incorrect implementation of the custom overridden implementation of this method.
The level WARNING:
In finalization code, if we tries to delete a temporary data file
by DataFileModel.delete(net.algart.arrays.DataFile) method many times, but it's impossible.
This situation occurs rarely and is not normal: the finalization code is performed only
when no instances of AlgART arrays use this data file and all mappings are already finalized,
so, the file should be normally deleted. Usually, if this situation though occurs,
the deletion of this file is just scheduled to the next garbage collection.
However, if there were a lot of attempts to delete this file already,
this fact is logged with WARNING level and deletion of this file by finalization code is canceled.
(The file will probably be though deleted by the built-in shutdown hook.)
The level CONFIG:
In built-in shutdown hook, when we delete a temporary data file.
Both normal deletion and failure while deletion are logged with the same CONFIG level, but with
different messages. (More precisely, the situations are logged when the
DataFileMode.delete(DataFile) method returns
true or throws an exception. If this method returns false, that means that
the file was already deleted, this situation is not logged or is logged with lower levels.)
The failure while deletion is not too serious problem, because the list of all non-deleted
temporary files can be retrieved by DataFileModel.allTemporaryFiles() method
and saved for further deletion in your own shutdown task installed by
Arrays.addShutdownTask(Runnable, TaskExecutionOrder).
Maybe, in some other situations.
Tasks that are better solved by standard Java collections
The following tasks are not well solved in this architecture: please use
standard Java libraries in these cases.
AlgART arrays are always oriented to random access: here is no analog of the standard
java.util.LinkedList
class.
AlgART arrays do not support synchronization and, therefore, may be not thread-safe.
There is an exception: all immutable arrays,
as well as most of all immutable objects, are thread-safe.
All AlgART arrays are thread-compatible:
there are no problems to use an external synchronization to provide simultaneous access
to any kind of arrays from several threads. See above
the precise specification of array behaviour in a case of multithreading access.
AlgART arrays do not support serialization (do not implement java.io.Serializable
interface). However, the Large memory model
provides easy storing arrays in external mapped files, that allows to implement an efficient alternative
to standard serialization mechanism.
A simple pool of the unresizable AlgART arrays
(usually work buffers) with the same size and type of elements,
based on a list of SoftReference or WeakReference.
The class simplifying the parallel processing a large AlgART array in several threads,
where each thread process a set of ranges of the source array (Array.subArray).
A set of static methods for getting some important global settings,
stored in system properties and used for customizing modules processing AlgART arrays.
The memory model allowing to create combined arrays:
special kind of AlgART arrays, that store an array of Java objects with minimal amount of memory,
namely in one or several another "parallel" arrays.
Data buffer: an interface allowing to read and write blocks
from / to some linear data storage, containing a sequence of elements
of any Java type, with maximal performance.
Unchecked exception thrown by DataBuffer methods from(),
to() and cnt(), if the values, they should be returned
by these methods, are greater than Integer.MAX_VALUE.
Histogram: an array of non-negative integer numbers b[v], 0≤v<M,
where every element b[v] represents the number of occurrence of the value v
in some source array A, consisting of integer elements in 0..M−1 range.
Implementation of ArrayExchanger, that simultaneously exchanges two pairs elements at the same
positions in two arrays: some byte[] array and some int[] array.
Implementation of ArrayExchanger, that simultaneously exchanges two pairs elements at the same
positions in two arrays: some char[] array and some int[] array.
Implementation of ArrayExchanger, that simultaneously exchanges two pairs elements at the same
positions in two arrays: some double[] array and some int[] array.
Implementation of ArrayExchanger, that simultaneously exchanges two pairs elements at the same
positions in two arrays: some float[] array and some int[] array.
Implementation of ArrayExchanger, that simultaneously exchanges two pairs elements at the same
positions in two arrays: some int[] array and some int[] array.
Implementation of ArrayExchanger, that simultaneously exchanges two pairs elements at the same
positions in two arrays: some long[] array and some int[] array.
Implementation of ArrayExchanger, that simultaneously exchanges two pairs elements at the same
positions in two arrays: some short[] array and some int[] array.
Unchecked exception thrown by some methods, processing several AlgART arrays or matrices,
when the passed arrays / matrices have different lengths / dimensions.
Summing histogram: an extension of Histogram class, allowing quick calculation of sums
of all elements of the sorted source array A[k] with indexes, lying in some range
r1≤k≤r2, or with values, lying in some range
v1≤A[k]≤v2.
The helper class for static methods of SummingHistogram class,
calculating the integrals of v(r) function between
two given values: minValue≤v≤maxValue.
AlgART array of any fixed-point primitive numeric, character or bit elements
(byte, short, int, long, char or boolean),
read/write access, no resizing.