Monday, January 29, 2018

Grids - Notes for an implementation

This post talks about the same subject already analyzed in a previous post but from a slightly different point of view, hoping to add clarity to the concepts. We assume to already have the grid delineated, as for instance the one in Figure. Some other program or someone else provided to us. All the information is written in a file, maybe in a redundant form, but it is there and we just have to read it.
Assume we are talking about a three-dimensional grid. Nodes, edges, faces, and volume are identified by a number (key, label) which are specified in the grid’s file.

Therefore the problem is to read this file and implement the right (Java) structures/objects to contain it, keeping in mind that our goal, besides to upload the data in memory is to estimate the time marching of a variable $A$ (and, maybe some other variable) in a given volume. Its time variation depends on fluxes of the same quantity (mass, to simplify) that are localised at the face that constitute the boundary of the volume.


Getting the things done

The simplest thing to do is then, to associate a vector whose entries are the values of $A$ for any of the volumes in the grid. Let say, that forgetting any problem which could be connected with execution speed, caching [1], boxing-unboxing of variables, we use a Hashmap to represent these values.
We will use also a Hashmap to contain the fluxes in each face. This hasmap contains $F$ elements: as many as the number of faces. The file from which we started contains all of this information and therefore we do not have any problem to build and fill these “vectors”.
Let’s give a look to what our system to solve can look like. The problem we have to solve changes but, schematically it could be:
For any volume (we omit the index, $i$ of the volume for simplicity):
$$ A^t = A^{t-1} + \sum_l a_l^{t} *i_l^{t}*f_l^t/d_l $$
Where:
$t$ is time (discretized in steps) and $t-i1$ is the previous step;
$l$ is the index of faces belonging to the volume
$d_l$ is the distance between the centroids of the two volumes that share the same face;
$i_l$ is a sign, +1 or -1, which depends on the volume and the face we are considering (volume index omitted);
$a_l$ is the area of the face $l$ or some function of of it.

For generality, the r.h.s. member of the equation is evaluated at time $t$, i.e. the equation is assumed to be implicit, but at a certain moment of the resolution algorithm, the function will be expressed as depending of some previous time (even if from the point of view of internal iterations). For a more detailed case than this simplified scheme, please see, for instance [2].
The Hashmap of $A$ contains the information about the number of volumes, i.e., $V$.
(I) an indication of the faces belonging to each volumeIl vettore (hash map) and
(II) the information about which volumes are adjacent
To obtain this, we have to store information about the topology of our grid. In the previous posts, we tried to investigate and answer to the question: which is the most convenient to store these informations ? (Right, more from a conceptual point of view than from a practical one).
From our previous analysis, we know that that for encoding the number of faces for any volume, we have to introduce a second (2) container that has has many position as the number of volumes, and for any volume a variable number of slots, each for any face of that volume (if the grids is composed by volumes of the same shape, the latter number of slots is constant for the internal elements of the grids, and variable just for the boundary volumes).
In this preliminary analysis, a Hasmap seems appropriate to contain this information, letting, for the moment, unspecified what types or objects contains this topology Hashmap, but eventually, they will contain a key or a number which identifies in a unique way a given face.
In this way the information about any face is present in two slots, belonging to the volumes that share the same face.
We have then the various quantities to store in each face:

  • $a_l$ (3) 
  • $f_l$ (4) 
  • $d_l$ (5) 


Anyone of the above quantites require a container with as many elements as the faces. We could, then, use three Hasmaps, whose indexes (keys) coincide with the numbers (keys) that in the topology Hasmap (2) realate faces to volumes.
To elaborate our equation we need then five containers, of which the topology one has a structure to be specified later. Well, actually all the hashmap internals has to be specified.
The elements of $a$ and $d$ are geometrical quantities that can - and has- to be specified outside the temporal cycle, if the grid structure is not modified during the computation. However, to be estimated they require further topological information that we still do not have (but can be in the grid file).
To estimate faces’ area, we need to know the nodes of the grid [3] which can be a sixth (6) container, and the way they are arranged in the faces, which is a seventh (7) container. Since the choices we did, we still choose to use Hashmaps to contain them. The Hashmap of nodes just contains the number (or the key) of nodes (and is, maybe, in most problems, pleonastic). The Hasmap of faces need to contain the arrangement of nodes, ordered in one of the two direction (left-hand -rule or right-hand-rule, clockwise and counterclockwise depends on the side you observe the face, so what is clockwise from a volume is counterclockwise for the other).
The (7) container has to have as many elements as the faces and each element contains the ordered nodes (a link, a reference, to). To estimate the area of the faces we need actually the geometry of the nodes, meaning their coordinates in some coordinate system. Usually, in most of the approaches, nodes are directly identified by their coordinates, which therefore are inserted directly (in the appropriate way) in container (7) instead that the link/reference to nodes' number (key, label).
However, I think that probably keeping the geometry separated from topology could be useful, because topology has its own scope, for instance in guiding the iteration in the summation that appears in our template equation.
Therefore we need a further container (the eight, 8) for the geometry, containing the coordinates of points. This container has $N$ elements, as many are the nodes.
The container of distances, d, to be filled needs to know between which volumes distances have to be calculated. This information, about volumes adjacency, needs another, further container (the nineth, 9) with length as the faces, i.e. with F elements. Every element, in turn, must contains the index of the elements between which is estimated.
This information that goes into the container 9, should already be in the file from which we are reading all the information. However, we should recover it by scanning all the volumes and finding which have a face in common. The latter, is a calculation that can be made off-line and we can, in any case consider it an acquired.
At this stage, we do not have much information about $f_l$. Certainly it will need to know which are adjacent volumes and requires the knowledge in container (9). Because $f_l$ is time varying it implies that information in (9) has to be maintained all along the simulation.
Every other information will require a further container. To sum up, we have a container of:

  1. quantity A;
  2. topology of Volumes;
  3. the area of faces;
  4. fluxes;
  5. distances between volumes’ centroids;
  6. nodes number (label, key)
  7. nodes that belong to a face
  8. coordinates of nodes
  9. topology of faces (referring to the volumes they separate)

Towards generalizations that look to information hiding and encapsulation


We can observe that we have three types of containers: the ones which contain topological information (2,6,7,9), those which contain physical quantities (1,4), those which contains geometric quantities (3,5,8).
If, instead than a 3D problems, we would have a 2D or 1D one, the number of container change, but not their types.
To go further deep, the first problem to deal with could be to understand how, in the topology container, for instance of volumes (2) how to make room for the slots indicating their faces, since they are of variable dimension. In traditional programming, usually they would have adopted a “brute force” approach: each slot would have been set to have the dimension of the larger number of elements to be contained. The empty element replaced by a conventional number to be check. Essentially all of it would have resulted in a matrix whose rows (columns) would correspond to the the number of elements (volumes, faces) and whose columns (rows) to the variable number of elements they contain (in the case of volumes, faces; in the case of faces, edges, and so on).
In a OO language, like Java, the sub-containers of variable dimensions can be appropriate object, for instance called generically “cell” containing an array of int[ ]. Therefore the global container of a topology could be a hashmap of cells.
In principle we could use the container defined above without any wrapper, directly defining them in term of standard objects in the library of Java 9.
However, we would like, maybe, to use other types eith resepcts to those we defined. For instance, in some cases, for speed reasons, we could substitute ArrayList to Hasmap or, someone of us, working on the complexity of caching could come out with some more exotic objects.
To respond to these cases, we would like then to introduce some abstraction which, without penalizing (too much) performances. Sure, we can define wrapper classes, for instance:
  • for topologies (essentially used to drive iterations)
  • for geometries
  • for physical quantities (used to contains data, immutable for parameters, and time-varying for variables)
These three classes would allow to fit all the cases for any dimension (1D, 2D, 3D): just the number of topology element would be varying.
However, this strategy could not be open enough to extensions which do not require breaking the code (be closed to modifications).
Using instead of classes, interfaces or abstract classes could be the right solution.
Classes, or BTW, interfaces could have also the added value to contain enough field to specify the characteristics of the entities, (es. if they work in 2D or 3D, their “name”, their units, all those type of information requested by the Ugrid convention). All these types of information are, obviously, also useful to make the programs or the libraries we are going to implement more readable and easier to be inspected by an external user.
While the topology class is self-explanatory, the geometry class (interface) has a connection to its topology. Therefore the geometry class should contain a reference to its topology to make explicit its dependence. A quantity object, for the same reason, should contain a reference to both its topology and its geometry.
The simplicity to use classes directly could be tantalizing, however, the investment for generality made by interposing interfaces or abstract classes is an investment for future.
Berti [3] advise, in fact, to separate the algorithm from the data structure, allowing therefore to write a specific algorithm once forever, and changing the data it uses, as we like. This would be a ideal condition maybe impossible to gain, but working to maintain in any case the possible changes in limited parts of the codebase is an add value to keep as reference. That is why “encapsulation” is one of paradigms of OO programming.

Some final notes

1 - In using cw-complexes to manage topology there could be overhead for speed. For instance, for accessing the values in a face of a volume, vi have to

access the volume,
access the address of the face
redirect to the appropriate quantity container to access the value
It could be useful then to eliminate one phase and once accessing the volume, having directly associate to it not the address of of the faces but the values contained in it.
If we have more than one value for face to access, related to different quantities and parameters, than maybe this added computational overhead could be considered negligible with respect to the simplicity of management of many quantities. In any case, an alternative to test.

2- At any time step, it is not only requested the quantity at time $t$, $A^{t}$, but also at the previous time, $t-1$, $A^{t-1}$. The two data structures share the same topology (which could represent a memory save). During time marching an obvious attention that the programmer needs to have is not to allocate a new grid to any time step. We can limit ourself to use only two grids across the simulation.

As an example, let us assume that time $t-1$ is going to be contained in vector $A^1$ and time $t$ in $a^2$. Then the above requirement could be obtained by switching the two matrixes as schematized as follows:
  • Create A1and A2,
  • Set A1 to initial conditions
  • For any t
  • A2=f(A1)
  • cwComplex.switch(A1,A2)
The switch method exchanges the names, but does now write anything in memory of $A^1$ and $A^2$. It could be schematised as follows
  • cwComplex.switch(A1,A2)
  • B = A1;
  • A1=A2;
  • A2=B;
It is clear that, in this way, all the vectors are always filled by values, while, for some operation, cleaning them could be worth.


3 - At the core of the method os solution of the equation under scrutiny, there could usually be a Newton method, e.g. [3], Appendix A, equation A8. Any efficiency improvement for the solver is then reduce to improve the speed of this core, that, eventually can be parallelised.


Bibliography


[1] - Lund, E. T. (2014). Implementing High-Performance Delaunay Triangolation in Java. Master Thesis (A. Maus, Ed.).

[2] - Cordano, E., and R. Rigon (2013), A mass-conservative method for the integration of the two-dimensional groundwater (Boussinesq) equation, Water Resour. Res., 49, doi:10.1002/wrcr.20072.

[3] - O'Rourke, J., Computational geometry in C, Cambridge University Press, 2007

[4] - Berti, G. (2000, May 25). Generic Software Components for Scientific Computing. Ph.D. Thesis

No comments:

Post a Comment