Tuesday, November 14, 2017

Open Science

Nothing really original in this post. I just recollect what already said in the FosterOpenScience web pages. Their definition is:

"Open Science represents a new approach to the scientific process based on cooperative work and new ways of diffusing knowledge by using digital technologies and new collaborative tools (European Commission, 2016b:33). The OECD defines Open Science as: “to make the primary outputs of publicly funded research results – publications and the research data – publicly accessible in digital format with no or minimal restriction” (OECD, 2015:7), but it is more than that. Open Science is about extending the principles of openness to the whole research cycle (see figure 1), fostering sharing and collaboration as early as possible thus entailing a systemic change to the way science and research is done." The wikipedia page is also useful.

This approach get along with the one of doing reproducible research which I already talked about several times. I do not have very much to add to what they wrote, but I also want to make you note that "there are in fact multiple approaches to the term and definition of Open Science, that Fecher and Friesike (2014) have synthesized and structured by proposing five Open Science schools of thought" .

In our work a basic assumption is that openness require also the appropriate tools, and we are working hard to produce them and use those other that make a scientific workflow open.

Friday, November 10, 2017

About Benettin et al. 2017, equation (1)

Gianluca (Botter) in his review of Marialalaura (Bancheri) Ph.D. Thesis brought to my attention the paper Benettin et al. 2017. A great paper indeed, where a couple of ideas are clearly explained:

  • SAS functions can be derived from the knowledge of travel and residence times probability
  • a virtual-experiment where they show that traditional pdfs (travel times pdf) can be seen an the ensamble of the actual time-varying travel times distributions.

The paper is obviously relevant also for the hydrological contents it explains, but it is not the latter point the one which I want  to argue a little. I want here just to argue about the way they present their first equation.

SAS stands for StorAge Selection functions and they are defined, for instance in Botter et al. 2011 (with a little difference in notation) as:
$$
\omega_x(t,t_{in}) = \frac{p_x(t-t_{in}|t)}{p_S(t-t_{in}|t)} \ \ \ (1)
$$
as the ratio between the travel time probability related to output $x$ (for instance discharge or evapotranspiration) and the residence time probability.
In the above equation (1)
  •  $\omega_x$ is the symbol that identifies the SAS
  • $t$ is the clock time
  • $t_{in}$ is the injection time, i.e. the time when water has entered the control volume
  • $p_x(t-t_{in}|t)$ with $x \in \{Q, ET, S\}$  is the probability that a molecule of water entered in the system at time $t_{in}$ is inside the control volume, $S$, revealed as discharges, $Q$, or evapotranspiration, $ET$

Equation (1) in Benettin et al. is therefore written as
$$
\frac{\partial S_T(T,t)}{\partial t} + \frac{\partial S_T(T,t)}{\partial T} = J(t) - Q(t) \Omega_Q(S_T(T,t),t)-ET(t) \Omega_{ET}(S_T(T,t),t) \ \ \ \ (2)
$$
Where:

  • $T$ is residence time (they call  water age but this could be a little misleading because the water age of water in storage could be, by their own theory different in storage, discharge, evapotranspiration)
  • $S_T$ is the age-ranked storage, i.e. “the cumulative volumes of water in storage as ranked by their age” (I presume the word “cumulative”  implies some integration. After thinking a while and looking around, also to paper van der Velde et Al. 2012, I presume the integration is over all the travel times up to $T$ which, because the variable of integration in my notation is $t_{in}$ means that $t_{in} \in [t,t-T]$  )
  • $J(t)$ is  the precipitation rate at time $t$
  • $Q(t)$ is the discharge rate at time $t$
  • $\Omega_x$ are the integral of the integrated SAS function which are more extensively derived below.

In fact, this (2) should be just an integrated version (integrated over $t_i$) of equation (9) of Rigon et al., 2016:
$$
\frac{ds(t,t_{in})}{dt} = j(t,t_{in}) - q(t,t_{in}) -et(t,t_{in})
\ \ \ \ (3)
$$
where:
  • $s(t,t_{in})$ is the water stored in the control volume at time $t$ that was injected at time $t_{in}$
  • $j(t,t_{in})$ is the water input which can have age $T=t-t_i$
  • $q(t,t_{in})$ is the discharge that exits the control volume at time $t$ and entered the control volume at time $t_{in}$
  •  $et(t,t_{in})$ is the evapotranspiration that exits the control volume at time $t$ and entered the control volume at time $t_{in}$
In terms of the SAS and the formulation of the problem given in Rigon et al. (2016), the $\Omega$s can be defined as follows:
\begin{equation}
\Omega_x(T,t) \equiv \Omega_x(S_T(T,t),t) := \int_{t-T}^t \omega_x(t,t_i) p_S(t-t_i|t) dt_i = \int_0^{p_S(T|t)} \omega_x(P_S,t) dP_S
\end{equation}
Where the equality ":=" on the l.h.s is a definition, so the $\Omega$s ($\Omega_Q$ and $\Omega_{ET}$) are this type of object. The identity $\equiv$ stresses that the dependence on $t_in$ is mediated by a dependence on the cumulative storage $S_T$ and $T$ is the travel time. As soon as $T \to \infty$, $\Omega \to 1$ (which is what written in equation (2) of Benettin's paper). This is easily understood because by definition ${\omega_x(t,t_i) p_S(t-t_i|t)} \equiv {p_x(t-t_i|t)}$ are probabilities (as deduced from (1)).
An intermediate passage to derive (2) from (3) requires to make explicit the dependence of the age-ranked functions from the probabilities. From definitions, given in Rigon et al., 2016. It is
$$
\frac{d S(t) p_S(t-t_{in}|t)}{dt} = J(t) \delta (t-t_in) - Q(t) p_Q(t-t_{in}|t) - ET(t) p_{ET}(t-t_{in}|t)
$$
which  is Rigon et al. equation (14).
Now integration over $t_i \in [t-T, t]$ can be performed to obtain:
$$
S_T(t_{in},t):= \int_{t-T}^T s(t_{in},t) dt_{in}
$$
and, trivially,
$$
J(t) = J(t) \int_{t-T}^T \delta(t-t_{in}) dt_{in}
$$
while for the $\Omega$s I already said.
The final step is finally to make a change of variables that eliminate $t_{in}$ in favor of $T := t-t_{in}$. This actually implies the last transformation. In fact:
$$
\frac{dS(t,T(t_{in},t))}{dt} =\frac{\partial S(t,T(t_{in},t))}{\partial t} + \frac{\partial S(t,T(t_{in},t))}{ \partial T}\frac{\partial T}{ \partial t} = \frac{\partial S(t,T(t_{in},t))}{\partial t} + \frac{\partial S(t,T(t_{in},t))}{ \partial T}
$$
since $\partial T/\partial t$ =1. Assembling all the results, equation (2) is obtained.

Note:
Benettin et al., 2017 redefines the probability $p_S$ as “normalized rank storage … which is confined in [0,1]” which seems weird with respect to the Authors own literature. In previous papers this $p_S$ was called backward probability and  written as $\overleftarrow{p}_S(T,t)$. Now probably they have doubt we are talking about probability.  In any case, please read it again: "normalized rank storage … which is confined in [0,1]”. Does not sound unnatural is not a probability ? Especially when you repetitively estimate averages with it and comes out with “mean travel times”?Operationally, it IS a probability. Ontologically the discussion about if there is really random sampling or not because there is some kind of convoluted determinism in the travel times formation can be interesting but it brings to a dead end. On the same premised we should ban the word probability from the stochastic theory of water flow, that, since Dagan has been enormously fruitful.

This long circumlocution,  looks to me like the symbol below

or TAFKAP, which was used by The Artist Formerly Known As Prince when he had problems with his record company.
In any case, Authors should pay attention in this neverending tendency to redefine the problem rather beacause it can look what Fisher (attribution by Box, 1976) called mathemastry. This is fortunately not the case of the paper we are talking about. But then why not sticking with the assessed notation ?

The Authorea version of this blog post can be found here.

References 

Tuesday, October 31, 2017

Meledrio, or a simple reflection on Hydrological modelling - Part VI - A little about calibration

The normal calibration strategy is to split the data we want to reproduce into two setz:

  • one for the calibration phase
  • one for the "validation" phase
Let's assume that we have an automatic calibrator. It usually:
  • generates a set of model's parameters, 
  • estimates with the rainfall-runoff hydrological model and any given set of parameters the discharges, 
  • compares what computed with what is measured by using a goodness of fit indicator
  • keeps the set of parameter that gives the best performances
  • repeats the operation a huge number of times (and use some heuristics for searching the best set overall)

This  set of parameters is the one used for "forecasting" and

  • is now used against the validation set to check its performances.
However, my  experience (with my students who usually perform it) is that the best parameter set in the calibration procedure, is not usually the best in validation procedure. So I suggest, at least as a trial and for further investigations to:

  • separate the initial data set into 3 parts (one for first calibration, one for selection, and one for validation).
  • Among the 1% (or x% where x is let at your decision) of best performing in the calibration phase  is selected (called the behavioural set). Then 1% (one over 10^4) best performing in the selection phase is further sieved. 
  • This 1 per ten thousand is chosen to be used in the validation phase
The hypothesis to test is that this three steps way to calibrate returns usually better performances in validation than the original two step steps one.

Sunday, October 29, 2017

Open Science Framework - OSF

And recently I discovered OSF, the Open Science Framework. My students told me that there exists many of them, of this type of on-line tools that make leverage of the cloud to store and helps groups to manage their workflow. However, OSF seems particularly well suited to work for scientists’ group, since it contains links various science-oriented features, like connections to Mendeley, Figshare, Github and others. An OSF “project” can contain writings, figures, codes, data. All of this can be uploaded for free in their servers or being maintained in one of your cloud storage like Dropbox or GoogleDrive. 

For starting, you can take one our of your time to follow one of their YouTube video, like the one below.

Their web page contains also some useful guides that make the rest (do not hesitate to click on icons: they contain useful material!). The first you can start with is the one about the wiki, a customizable initial page that appear in any project or sub-project. There are some characteristics that I want to emphasize here. Startin a new project is easy, and when you have learn how to do it, you almost have learn all of it. Any project can have subprojects, called “components”. Each component behaves like a project by itself, so when dealing with it, you do not have to learn something really new. Any (sub)project can be private (the default) or public, separately, and therefore your global workflow can contain private and public stuff. 

Many people are working on OSF. For instance Titus Brown’s Living in a Ivory Basement blog also has some detailed review of it. They also coded a command line client for downloading files from OSF which can be further useful.  

Wednesday, October 25, 2017

Return Period


Some people, I realised, have problems with the concept of return period. This is the definition in wikipedia (accessed October 25th, 2017):
A return period, also known as a recurrence interval (sometimes repeat interval) is an estimate of the likelihood of an event, such as an earthquake, flood[1], landslide[2], or a river discharge flow to occur.
It is a statistical measurement typically based on historic data denoting the average recurrence interval over an extended period of time, and is usually used for risk analysis (e.g. to decide whether a project should be allowed to go forward in a zone of a certain risk, or to design structures to withstand an event with a certain return period). The following analysis assumes that the probability of the event occurring does not vary over time and is independent of past events.
Something that wikipedia does not include is rainfall intensity. The first paragraph, should be then something like:
"A return period of x time units, also known as a recurrence interval (sometimes repeat interval) is an estimate of the likelihood of an event, such as an earthquake, flood[1], landslide[2], rainfall intensity, a river discharge flow or any observable, to occur (or be overcome) on average every x time units."

Return period clearly involves a statistical concept, which is traced back to a probability, and a time concept, that is the sampling time.
Let us assume we have a sequence of data, for which, at moment, the sampling time is unknown, composed by a discrete number, $n$, of data.
The empirical cumulative distribution function (ECDF) of the data is a representation of the empirical statistics for those data. Let $ECDFc$ be the complementary empirical cumulative distribution function, meaning $ECDFc(h) \equiv 1 - ECDF(h)$.
Let h* be one of the possible values of these data (not necessarily present in the sequence but included in the range of experimental values). We are interested in the probability of $h^*$ being overcome. If $m$ is the number of time $h^*$ is matched or overcome, then
$$ ECDFc(h^*)= m/n $$
$$ECDF(h^*) = 1 - m/n$$
We can, at this point assume that ECDF resembles some probability function, but this is a further topic we do not want to talk about here. What we want to stress i s that ECDFs (probabilities) are not automatically associated to a time. All the data in the sequence refers to different picks of a random variable, and these picks are not necessarily time-ordered or can be happened all at the same time. So the “frequencies" that can be associated to the above events are not time frequencies.
Now introduce time by saying that, for instance, each datum was sampled at regular time step $\Delta t$, what before I called “time units”, and, for practical reasons we are not interested to the ECDF of the data but to know how frequently (in clock time sense) it is repeated. So, we can say that the total time of our record is
$$T = n\, \Delta t$$
and in this time span, the number of time, h* is overcome is (by construction)
$$m=ECDFc(h^*)*n$$
On average, along the record obtained, the time frequency on which values greater than $h^*$ are obtained is the empirical return period:
$$T_r:=\frac{T}{m} =\frac{n *\Delta t}{ECDFc(h^*)*n} = \frac{\Delta t}{ECDF(h^*)}$$
So, the empirical return period of $h^*$ is inversely proportional to the complementary ECDF($h^*$) but, properly there is a “$\Delta t$” to remind that it is given in time units. One basic assumption in our definition is that the underneath probability is well defined, which is not if climate change is in action. This is a delicate and well discussed topic*, but again, not the core of this page.

There is a crucial initial step, the sampling of data which affects the final result. If the data in the sequence are, for instance annual maxima of precipitation, then the return period is given in years. If the data were daily precipitation totals, then the return period is given in days. And so on. Because usually the time unit has value “1” (but dimension of a time), the numeric value of the return period is just the inverse of the ECDFc. We should not forgot, however, that the equation contains a mute dimension. We are talking about times, not dimensionless numbers (probabilities).

Being Bayesian, probably you can introduce this in a different way. I let you as an exercise to do it.

** On the topic of stationarity, please give a look to:

Milly, P. C. D., Betancourt, J., Falkenmark, M., Hirsch, R. M., Kundzewicz, Lettenmaier, D. P., & Stouffer, R. J. (2008). Stationarity Is Dead: Whither Water Management? Science, 319, 1–2.

Montanari, A, and Koutsoyiannis, D,  Modeling and mitigating natural hazards: Stationarity is immortal!, Water Resources Research, 50 (12), 9748–9756, doi:10.1002/2014WR016092, 2014.

Serinaldi, F., & Kilsby, C. G. (2015). Stationarity is undead: Uncertainty dominates the distribution of extremes. Advances in Water Resources, 77(C), 17–36. http://doi.org/10.1016/j.advwatres.2014.12.013

Sunday, October 22, 2017

Simple models for hydrological hazard mapping

This contains the second talk I gave to high-school teacher at MUSE for the Life Project FRANCA. My intention was to show (under a lot of simplification assumptions) how hydrological models work, and give a few hints on which type of hydraulics models of sediment transport can be useful.
 Clicking on the figure above you can access the slides (in Italian but with a little time, I will provide a translation). In their simplicity, the slides are a storyboard for action that could be taken in the SteepStream project to provide an estimation of hazards of Meledrio river basin (and the other two selected).

Friday, October 20, 2017

On some Hydrological Extremes

This is the talk given at MUSE for the Life FRANCA Project. Life FRANCA has the objective to communicate with people about hydrological hazards and risk. In particular the Audience in this case was composed by high school teachers.


Clicking on the Figure you will be redirected to the presentation.

Wednesday, October 18, 2017

Using Colorblind friendly Plots

Brought to my attention by Michele Bottazzi. I rarely think to this. Instead it is important. Please refers to this Brian Connelly post:

Click on the figure to be redirected. BTW, this was the 500th post!🎉

Tuesday, October 17, 2017

TranspirAction

This post contains the presentation given by Michele Bottazzi. His presentation look forward to dig into the forecasting of transpiration from plants (and evaporation from soils) through concentrated parameters modelling. His findings will have a counterpart in our JGrass-NewAGE system.
The figure illustrate his willing to find a new, modern, way to scale up leaf theories to canopy and landscape. The starting point is one recent work by Schymanski and Or but it will go, hopefully, far beyond it. Click on the Figure to access his presentation.

An ML based meta modelling infrastructure for environmental meodels

This is the presentation Francesco gave for his admission to the third year of Ph.D. studies. He summarizes his work done so far and foresees his work during the next year.
Francesco's work is a keystone of the work in our group, since he sustains most of informatics and pur commitment to OMS3. Besides of this two are his major achievements: the building of the Ne3 infrastructure (an infrastructure inside an infrastructure!)  which allows an enormous flexibility to our modelling, and the new road opened towards modeling discharges through machine learning techniques. But there are other connections he opens that are visible through his talk. Please clisk on the figure to access the presentation.

Sunday, October 15, 2017

A few topics for a Master thesis in Hydrology

After the series about Meledrio I thought that each one of the post actually identifies at least one Thesis topic:

Actually, each one of them could be material for more than one Thesis, depending the direction we want to take. All the Theses topics assume that JGrass-NewAGE is the tool used for investigations.
Actually there are some spinoff of those topics:
  • Using machine learning to set part of model inputs and/or 
  • Doing hydrological modeling with machine learning
  • Preprocessing and treating (via Python or Java) satellite data as input of JGrass-NewAGE (a systematisation of some work made by Wuletawu Abera on Posina cacthment and/or Blue Nile)
  • Implementation of the new version of JGrass-NewAGE on val di Sole
  • Using satellite data, besides geometric features, to extract river networks
  • Snow models intercomparison (GEOtop and those in JGrass-NewAGE, with reference to work done by Stefano Tasin and Gabriele Massera) 
Other to other Hydrological topics:
  • Mars (also here) and planetary Hydrology (with GEOtop or some of its evolutions which account for different temperature ranges and other fluid fluxes)
  • Copying with Evapotranspiration and irrigation at various scales
  • Copying the carbon cycle to the hydrological cycle (either in GEOtop or in JGrass-NewAGE)
Other possible topics regarding water management:
  • Hypothesis on the management of reservoir for optimal water management in river Adige.
  • Managing Urban Waters Complexity
Other possible topics regards, on a more theoretical (mathematical-physical) side:
On the side of informatics:
For who wants to work with us on the Master thesis, the rules to follow are those for Ph.D. students, even if to a minor extent. See here:

Saturday, October 14, 2017

Meledrio, or a simple reflection on hydrological modelling - Part V

Another question related to discharges is, obviously their measure. Is discharge measure correct ? Is the stage-discharge relation reliable ? Why do not give intervals of confidence for the measures ? Yesterday, a colleague of mine, told me. A measure without an error band is not a measure. That is, obviously an issue. But today reflection is on a different question.  We have a record of discharges. It could look like this (forgive me the twisted lines):
Actually, what we imagine is the following:
I.e. we think it is all water. However, a little of reflection should make us think that, a more realistic picture is:
Meaning that part of the discharge volume is actually sediment transported around. This open the issue on how to quantify it. Figure enlighten than during some floods, actually the sediment could be a consistent part of the volume, and, if we are talking of small mountain catchments like Meledrio, it could be the major part of the discharge. Hydraulics and sediment transport, so far, was used separately from hydrology and hydrology separated from sediment transport, but what people see is both of them (water and sediment).
This actually could be not enough. The real picture could be, actually like this:
Where we have some darker water. The mass transport phenomena, in fact, could affect part of the basin during intense storms, but the liquid water could not be able to sustain all this transport. Aronne Armanini suggested to me that, in that case, debris flow can start and be stopped somewhere inside of the basin. Te water content they have, instead, could be equally likely released to the streams and boosting furthermore the flood.  Isn't it interesting ? Who said that modeling discharges is an assessed problem ?

Friday, October 13, 2017

Meledrio, or a simple reflection on hydrological modelling - Part IV

An issue that often is risen is about the complexity of models. Assuming the same Meledrio basin, which is the model we can think to be the simpler for getting quantitatively the water budget ?
The null-null hypothesis model is obviously using the past averages to get the future. Operatively:
  • Get precipitation and discharge 
  • Precipitation is  separated by temperature (T) in rainfall (T>0) and snowfall. Satellite data can be used for the separation. 
  • Take their average (maybe monthly average)
  • Take their difference. 
  • Assume that the difference is  50% recharge and 50% ET

My null hypothesis is the following. I kept it simple but not too simple:
  • Precipitation, discharge and temperature are the measured data
  • Their time series are split into 2 parts (one for calibration and one for validation)
  • Precipitation is measured and separated by temperature (T) in rainfall (T>0) and snowfall (T<0). Satellite data can be used alternatively for the separation. These variable can be made spatial by using a Kriging (or
  • Infiltration is estimated by SCS-CN method. SCS parameters  interval are set according to soil cover, by distinguishing it in qualitatively 4 classes of CN (high infiltrability, medium high, medium low, low). In each subregion, identified by soil cover, CN is let vary in the range allowed by its classification. Soil needs to have a maximum storage capacity (see also ET below). Once this has been exceeded water goes to runoff. 
  • Discharge is modeled as a set of parallel linear reservoirs. One for HRU (Hydrologic Response Unit). 
  • Total discharge is simply the summation of all the discharges of the HRUs.
  • CN and mean residence time (the parameter in linear reservoirs) are calibrated to reproduce total discharge (so a calibrator must be available)
  • A set of optimal parameters is selected.
  • Precipitation that does not infiltrates is separated into evapotranspiration, ET, and recharge.
  •  ET is estimated with Priestly-Taylor (so you need an estimator for radiation) corrected by a stress factor, linearly proportional to the water storage content. PT alpha coefficient is taken at its standard value, i.e 1.28
  • What is not ET is recharge.  Please notice that there is a feedback between recharge and ET because of the stress factor. 
  • If present, snow is modeled through Regina Hock model (paper here), in case, calibrated trough MODIS.
The Petri Net representation of the model (no snow) can be figured out to be as follows:

The setup this model, therefore is not so simple, indeed, but not overwhelmingly complicate.

Any other model has to do better than this. If successful, it become hp 1. 
A related question is how we measure goodness of fitting and if we can distinguish the performances of one model from another one. That is, obviously, another issue.

Thursday, October 12, 2017

Meledrio, or a simple reflection on hydrological modelling - Part III

Well, this is not exactly Meledrio.  It starts a little downstream of it. In fact, we do not have discharge data in Meledrio (so far) and we want to anchor our analysis to something measured. So we have a gauge station in Malè. A gauge station for who does not know it, measure just water levels (stages) and them convert to water discharge through a stage-discharge relation (see USGS here). Anyway, a sample signal is here:
The orange lines represent discharge simulated with one of our models (uncalibrated at this stage). The blue line is the measured discharge (meaning the measured stage after having applied an unknown stage-discharge relationship, because the guys who should did not gave us it). But look at little more closer:
We could have provided a better zooming, however, the argument of discussion is: what the hell is all that noise in the measured signal ? It is natural ? It is error of measurements ? Is due to some human action ? 
Having a better zoom, one could see that that signal is almost a square wave going up and in few hours, and therefore the suspected cause are humans. 
Next question: how can we calibrate the model that does not have this unknown action inside to reproduced the measured signal ?
Clearly the question is ill-posed and we should work the other way around. Can we filter out in the measured signal the effect of humans ?
Hints: we could try to analyze the measured signal first. Analyzing actually could mean, in this case, to decompose it, for instance in Fourier series or Wavelets and wipe away the square signal (a hint in hints), reproducing an "undisturbed signal" to cope with. 
Then we could probably calibrate the the model to the cleaned data. Ah! You do not know what calibration means ? This is another story.

P.S. - This is actually part of a more general problem, which is measurement treatments. Often we, naively, treat them as true values. Instead they are not and should pre-analyzed for consistency and validate before. MeteoIO is a tool that answers to part of the requests. But, for instance, it does not treat the specific question above.

Wednesday, October 11, 2017

Meledrio, or a simple reflection on hydrological modelling - Part II

In the previous studies made on the hydrology of Meledrio some ancillary data are produced. For instance:

Soil Use
Geo-lithology-Lithology

Usually also other maps are produced, for instance soil cover (which, in principle, could be different from soil use).  The problem I have is that, usually, I do not know what to do with these data.  There are actually two questions related to maps of such kind.
  • The first is,  are these characteristics are part of the model (see, for instance, the previous post)?. 
  • The second is, if the models somewhat contains a quantity, or a parameter,  that can be affected by the mapped characteristics, but the is not directly the characteristic,  how the parameter can be deduced ? In other words there is a (statistical) method to relate soil use to models parameters ?  
I confess that the only systematic trial to obtain this type of inference that I know are the pedotransfer functions. Whilst the concept could be exported to more general models' attributes, however they refer to very specific models that contains hydraulic conductivity or porosity as a parameter and not to other models, for instance those based on reservoirs, where hydraulic conductivity usually is not explicitly present.
Another typology of sub-models where something similar exists is the SCS-CN model.  Specific models, sometimes can contain specific conversion tables produced either by Authors than practictioners (SWAT, for instance).  In SCS-CN, the tables of soil categories are associated with values of the Curve Number parameters, and people pretend to believe that the association is reliable. But it is fiction not science.
In a time when reviewers say that modelling discharges is not enough to assess the validity of a hydrological model, at the same time they allows holes in the peer review process where papers make an unscrupulous use of the same concept.  
There is actually a whole new science branch, hydropedology, that seems devoted to the task to transform maps of soil properties into significant hydrological numbers (mine is the brutal interpretation of it, obviously hydropedology has the scope to understand, not only to predict), and I add below some relevant reference.  However, the analysis are fine and interesting food to thoughts, but the practical matter is still scanty. Probably for two facts: because normal statistical inference is not enough sophisticated to obtain important results (beyond pedotransfer functions) and because (reservoir type of) models have parameters that are too much involved to be interpreted as a simple function of a mapped characteristics. An opportunity for machine learning techniques ?

References

Lin, H., Bouma, J., Pachepsky, Y., Western, A., Thompson, J., van Genuchten, R., et al. (2006). Hydropedology: Synergistic integration of pedology and hydrology. Water Resources Research, 42(5), 2509–13. http://doi.org/10.1029/2005WR004085

Pacechepsky, Y. A., Smettem, K. R. J., Vanderborght, J., Herbst, M., Vereecken, H., & Wösten, J. (2004). Reality and fiction of models and data in soil hydrology (pp. 1–30).

Vereecken, H., Schenpf, A., Hoopmans, J. V., Javaux, M., Or, D., Roose, J., et al. (2016, May 13). Modeling Soil Processes: Review, Key Challenges, and New Perspectives. http://doi.org/10.2136/vzj2015.09.0131

Vereecken, H., Weynants, M., Javaux, M., Pachepsky, Y., Schaap, M. G., & Genuchten, M. T. V. (2010). Using Pedotransfer Functions to Estimate the van Genuchten–Mualem Soil Hydraulic Properties: A Review. Vadose Zone Journal, 9(4), 795–27. http://doi.org/10.2136/vzj2010.0045

Terribile, F., Coppola, A., Langella, G., Martina, M., & Basile, A. (2011). Potential and limitations of using soil mapping information to understand landscape hydrology. Hydrology and Earth System Sciences, 15(12), 3895–3933. http://doi.org/10.5194/hess-15-3895-2011

Tuesday, October 10, 2017

Meledrio, or a simple reflection on hydrological modelling - Part I

The problem is well explained by the following figure, which represents the statistics of slopes in Meledrio basin.
The overall distribution is bimodal, that make us to suspect that something was going on. In fact, this below is the Google view of the basin.
It clearly show that the hydrographical right side of the basin (on the left in figure) is the one that has steeper slopes, and the left side the one that has the lower ones. This is definitely shown by the slope map
(Please observe that the map is reversed with respect the Google view, since there we were looking to the basin from North). Different slopes, would be associated in our mind with different runoff and subsurface water velocities. This would clearly be accounted for in a model like GEOtop but not (at least explicitly) by a system of reservoirs, especially when we calibrate all the reservoirs all together. A possible partition of the basin in the Jgrass-NewAGE system is represented below
Because the single Hydrologic Response Units are mostly on one side of the catchment, they could be said to be in a area which is homogeneous from the point of view of slope statistics. Therefore, when we treat it as a collection of reservoirs, in principle we could parameterise them differently, according to their slope. In practice, however, we do not have enough measurements to be able to do this separate calibration and we look at the basin homogeneously.  Are we not missing something ?
Well, we are. The first thinking would be to try to add to our reservoir the knowledge gained from geomorphology, and assume that the mean travel time, or some relevant parameter connected to it, depends proportionally to (mean) slope (or some of its power) and inversely to the distance water has to across to get out of the HRU. This is obviously possible, and maybe we could easily try it.
In general, however,  hydrologists who are not stupid, do not care of it. Why ? The reasons can be that our assumption that slopes count is blurred by the heterogeneity of the other factors that concur to form the hydrologic response. However, the magnitude of the heterogeneity can be different at different scales and could be really nice to do some investigations in this direction.

Friday, October 6, 2017

SteepStreams preliminary Hydrological works

This contains the talk given at the 2017 meeting of the SteepStreams ERANET project. It is assumed to talk about the hydrological cycle of the Noce river in Val di Sole valley (Trentino, Italy). It is a preliminary view of what we are going to do in the project and does not pretend to present particularly deep results. However, it could give some interesting hints on methodology.
https://www.slideshare.net/GEOFRAMEcafe/lisbon-talk-for-steepstreams

Clicking on the figure, you can access the presentation.  Here below you find also a more detailed summary with links of material about the Meledrio basin, one of the experimental catchments used in the project.
As above, clicking on the figure, you can access the presentation.

Friday, September 29, 2017

Google Earth Engine

If I would say it all, I would prefer to have a non profit organisation financed by states doing it. However I cannot deny that the project is fascinating and offers (but I have to explore it) interesting possibilities. I am talking of the Google Earth Engine of which I came to know few hours ago.
Here you can watch to a YouTube video talking about the project:


The page of the system is: https://earthengine.google.com/
Having your opinions, thoughts, impressions of use, would be an interesting feedback.

Grids: Notes for an implementation

This contains some hints and discussions about how to implement Grids  (that I learned to call CW-Complexes) in a Object Oriented language. Specifically the discussion is made with Java in mind, but obviously, not limited to it.  These slides do not contain very much bibliography and are, by far, not a complete treatment of the subject. They hope to be, however, some useful "food for thought" to start with.

Clicking on the above figure you'll be redirected to the presentation that contains the seeding for a deployment, at least in our overall system.

Tuesday, September 19, 2017

Evaluation of Dr. *** work and research. For getting tenure

It often happens that an Academic is asked to assess his/her peer work. This below is an example of one letter that I made recently. However, in boldface I put comments in a less traditional language that I think would give more insight with respect to the letter I actually wrote both to people who judge the researcher. Here you find it below:

The short answer to your Institution request is: there is no doubt that Dr. *** deserves to get his/her tenured position, s/he is one of the best in the world in what s/he does.

C: The short answer is: this man/woman is damned good. S/he is far above the average and you would be stupid not to keep him/her.

The long assessment is easy too. My own research work intersects very much with Dr. ***'s one. So I came to know and follow her/his production since ten years ago. Her/His paper are among those I cited more frequently in my recent research and s/he is one of the two or three researchers younger than me that I believe it is necessary to follow when working on hydrological modelling.

C: I do not need to read her/his papers. I already read and frequently cited her/him. So: s/he is good. S/he regularly publish just on the best journals. S/he has a rate of citation slightly than mine. So, maybe s/he is better than me. Stop. 

S/he also publish regularly with some of the best researchers in the field and this does not usually happen to everyone.

C: S/he is well connected. The group s/he frequents is extremely good but tends to be self-referential. This exposes her/him to the threat to follow the main stream (which helps in publishing) and not be a paradigms breaker (which could be hard to sustain). However, if s/he is smart, as I believe, s/he will avoid the pitfalls of situation.

Finally her/his DUPER environment is a milestone in recent literature.

C: Her/his framework is damned good. It is one of the main contenders of mine. If I would be a super rampant guy, I would try to obstacle it. But I am a professor (sigh).

It happens, obviously, that I disagree on some details of her/his research, but this is matter of normal dialectics between peers.

C: S/he made some choices different from mine and sustains them with some statements that I judge debatable. But respect to the shit I see around there is by far no competition. 

From the recent papers s/he chose for evaluation, I could see that s/he also enlarged her/his view on using Bayesian techniques for estimating model’s parameters.

C: S/he is able to understand where the fashionable stuff is. This attitude does not produce science by itself but for sure is one of the characteristics that good Academics must have at least a bit (sometimes to avoid to follow fashionable stuff).

His/Her approach is sophisticate and requires certainly some deepening from myself, but I could appreciate its novelty.

C: The last paper is really technical and if s/he insists too much in this direction s/he can fall into hydrological mathematistry

Whilst her/his initial production reflects also %%%% view of the matter, it is quite clear now what is Dr. *** own contribution and evolution.

C: Her/his former and influential boss quite determined her/his initial research, but s/he seems having coming out from her/his boss old fashionable approach, which was not up-to-date and could have brought her/him to produce crappy stuff. Which was not. 


To summarise, when s/he writes: "My research areas in catchment modelling can be broadly classified as: (1) the development of flexible models,which provide the building blocks for the construction of models, (2) the formulation of guidelines for model development, and (3) the use of models to interpret catchment behaviour and to produce reliable predictions. Through these research areas I have contributed to important developments in hydrological sciences in the past years, such as the use of models for hypothesis testing, the incorporation of experimental knowledge inthe modelling process, and the understanding of large scale hydrological processes and their controls.”, s/he give an image of her/himself that I fully share and think is appropriate.

C: S/he knows what s/he is and does. S/he is ready for his role. 

S/he individuates three areas of progress for his research: a) Flexible models development; b) Theory of (hydrological) models developments; c) Understanding and prediction of catchment behavior.

C: This intersects with my activities, that’s why I reformulated the name of her/his second area of interest. My, I think, is more appropriate. 

Recently there was, among many colleagues, the idea that the topics of surface hydrology were mature and that research has to move to hybrid fields like socio-hydrology, eco-hydrology (with an accent to ecology more than hydrology), ecosystem services and so on.

C: Most of colleagues, even gifted ones, tends to give for granted was is actually not (at all).

With respect to these new, certainly exciting directions, the focus of Dr. *** seems quite traditional. However, I fully approve it. The kind of research s/he is pursuing is fundamental and necessary after years of blown dilettantism that has relevant consequences in research and practice.

C: With respect to these topics most colleagues are either wrong or superficial. (Someone has to use these occasion to push out his frustration). 

Conclusions and extension obtained from ill-conceived concepts, improperly used models, and lack of hypothesis testing brings to wrong interpretations of hydrological facts and can have negative consequences on engineering applications and cause the choice of wrong policies.

C: They use wrong findings based on misconceptions and move to new subfields with wrong information. GIGO

Dr. *** work is important and its importance is going to become more and more clear in the next years. Hopefully just a few model infrastructures will emerge from the present fragmentation, and I believe the one from the evolving work of Dr. *** will be in the group.

C: Just a few framework will survive when people finally will understand the limits of hydrological dilettantism. Possibly the work of this researcher will survive and for sure, it will do for the next decade.

Probably, If I would be her/him, I would try to broaden a little the perspective including, besides the hydrologic response, the whole set of hydrological processes that concern catchments’ budget more seriously, and, in particular, evapotranspiration that, in some environments covers even sixty per cent of the water budget.

C: Engineers, even some ecohydrologists, are obsessed with discharges. These are just a part of the game. Future frameworks have to play the full game.

I would also devote some attention to the approach with travel time distributions which could open new perspectives in modeling of nutrients and pollutants, a field which Dr. *** already came across.

C: S/he forget some fundamental aspects concerning her/his own area of interest. S/he should not.

Finally, I have no doubt that Dr. *** will continue to improve and bring great contributions to your Institution.

C: I criticize her/him as I do with those I really like. S/he is a good person. Everybody can work with her/him and have benefits. They know that they want to keep her/him and I appreciate it. This will made her/him more free and more brave.

P.S. - They keep him/her

Wednesday, September 13, 2017

A smooth introduction to some Algebraic Topology topics

Since the previous post on Grids touched some topics on algebraic topology, I  selected a few sites where I found some interesting information from our point of view that can help the reading of the previous slides and, in general, of the references already presented.

I do not share many of Tonti's opinions, but some of his talks and books are, indeed, enlightening.

Monday, September 11, 2017

Meshes, Grids, CW-Complexes

Representation of space (and time) is a necessary step to implement any Physics. However, the topic is seldom faced with the appropriate generality, and this reflects into implementations in softwares that do not have a general structure. This is the rational for talking here about meshes or grids.
From a quick view of the material found in literature, it appears that a lot of work has been done, by few people (group). There are at least two pathways to follow. The first is the Ugrid mesh specification. We can describe it as the classification work of mesh by power users, i.e. people who use meshes for describing (especially environmental) numerical problems. Their work is concentrated on semantics and explaining what the mesh are,  with the scope to insert them in NetCDF, a self explaining file format conceived to contain environmental data.
The second approach, in Berti (2000) starts from more fundamental mathematical work which is also used in Heinzl (2007, but referring to the paper Heinzl, 2011, could be convenient).
In general, the first two chapters of Berti’s dissertation are a must-to-read for those who deals with scientific computing.
A subsequent number of papers cover two topics: how to store mesh in databases and how to give to these structures the right flexibility to be parallelized. Interestingly some of the mathematical work actually flowed into the creation of C++ libraries, in particular the GRAL libraries, developed by Berti himself.

https://www.slideshare.net/secret/2TufpeFQeb62FR

Browsing around, mesh are rigorously defined in Algebraic Topology (e.g. Hatcher, 2001) and this was  recognized in various papers, since the sixties (e.g. Branin, 1966, and references therein). A general discussion, which involve the nature of Physical laws, was produced by Tonti (2013) and somewhat pushed also by Ferretti (2015).  These treatments could bring in the matter some new insight and the general understanding. What we did with the slides was to try to synthetize some of the above work, especially in view of an implementation of some Java libraries. The idea suggested by the readings was to use generic programming, design patters (Gamma et al, 1994, Freeman et al., 2005) and programming to interfaces (which BTW we already have in mund: we found what we were looking for). 
But the detail of the implementation will be the topic of a future post (but you can have a glance to literature browsing the bibliography below). Now get (a little of) theories by clicking on the figure above.

References
Notes

Some of my  students asked for somewhat a milder introduction to algebraic  topology.  I dedicated a new short post to it.

Friday, September 8, 2017

Weather Generation (according to Korbinian Breinl)

How one can reasonably cope with simulating future hydrometeorological forcing for hydrological purposes ? Clearly, since meteorology is dominated by unpredictable phenomena (in the sense of chaotic ones) and we cannot pretend to simply use forecasts, when we are looking just a little far away. An option would be to use climatic models and doing dynamic downscaling of their outcomes. The previous lecture given by Jeremy Pal (GS), followed this research path. However, we can produce statistical weather scenarios using stochastic weather generators (SWG) too, once  we have an idea of what will be the mean characteristics of such system.
Literature is full of SWG that covers mainly temperature and rainfall, but it seems there exists systems also that covers other meteorological variables, like wind and radiation.
Today we had a talk on the subject given at our Department of Civil, Environmental and Mechanical Engineering given by Korbinian Breinl. He is at present a post-doc at Uppsala University with Giuliano Di Baldassarre (GS) and we are collaborating in the SteepStreams project.

https://www.slideshare.net/secret/3mSM1qPLUj4BMR
As usual you can find his presentation by clicking on the figure above. But you can see the talk on YouTube

Besides flooding (and solid flooding) which is one of the scopes of the projects, I hope we succeed in modeling all the main components of the hydrological cycle by a combine use of Korbinian’s Generator and JGrass NewAGE. I have also the video record of his presentation, but not yet the approval to share it publicly on YouTube. However you can ask it to me writing to abouthydrology @ gmail.com.

Korbinian's generator is written in Matlab, and it is available through Github.

Please find below a reference list which include, besides Korbinian’s one, some other references that I could gather through time.

References

Other available codes