Krzysztof Tesch - Politechnika Gdańskakrzyte/students/optimisation_book.pdf · Continuous...

Krzysztof Tesch

Continuousoptimisationalgorithms

Gdańsk 2016

GDAŃSK UNIVERSITY OF TECHNOLOGY PUBLISHERS CHAIRMAN OF EDITORIAL BOARD Janusz T. Cieśliński

REVIEWER Krzysztof Kosowski

COVER DESIGN Katarzyna Olszonowicz

Published under the permission of the Rector of Gdańsk University of Technology

Gdańsk University of Technology publications may be purchased at http://www.pg.edu.pl/wydawnictwo/katalog orders should be sent to [email protected]

No part of this publication may be reproduced, transmitted, transcribed, stored in a retrieval system or translated into any human or computer language in any form by any means without permission in writing of the copyright holder.

Copyright by Gdańsk University of Technology Publishers Gdańsk 2016

ISBN 978-83-7348-680-5

GDAŃSK UNIVERSITY OF TECHNOLOGY PUBLISHERS

Edition I. Publishing sheet 8,6, sheet printing 13,25, 1147/947

mailto:[email protected]

Contents

1 Introduction 71.1 Standard problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.2 Global and local minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Feasibility problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.5 Classification of optimisation problems . . . . . . . . . . . . . . . . . . . . . . 91.6 Classification of algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.7 Hyperoptimisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.8 Test functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1.8.1 Multimodal test function . . . . . . . . . . . . . . . . . . . . . . . . . 121.8.2 Unimodal test function . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1.9 Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Single-point, derivative-based algorithms 142.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.1.1 Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.2 Gradient and Hessian of a function . . . . . . . . . . . . . . . . . . . . 14

2.1.2.1 Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.2.2 Hessian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.2 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162.3 Modified Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.4 Method of steepest descent . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.5 Quasi-Newton methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.5.1 Secant method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5.2 Other methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.6 Conjugate gradient method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.7 Conditions for optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3 Single-point, derivative-free algorithms 253.1 Random variables and stochastic processes . . . . . . . . . . . . . . . . . . . . 25

3.1.1 Selected random variables . . . . . . . . . . . . . . . . . . . . . . . . . 253.1.1.1 Discrete uniform distribution . . . . . . . . . . . . . . . . . . 253.1.1.2 Continuous uniform distribution . . . . . . . . . . . . . . . . 263.1.1.3 Normal distribution . . . . . . . . . . . . . . . . . . . . . . . 263.1.1.4 Levy alpha-stable distribution . . . . . . . . . . . . . . . . . 27

3.1.2 Selected stochastic processes . . . . . . . . . . . . . . . . . . . . . . . 283.1.2.1 Wiener process . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1.2.2 Levy flight . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4 Contents

3.2 Random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.1 Uncontrolled random walk . . . . . . . . . . . . . . . . . . . . . . . . . 303.2.2 Domain controlled random walk . . . . . . . . . . . . . . . . . . . . . 313.2.3 Position controlled random walk . . . . . . . . . . . . . . . . . . . . . 32

3.3 Simulated annealing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343.4 Random jumping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Multi-point, derivative-free algorithms 374.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.1.1 (Meta)heuristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.1.2 Nature-inspired algorithms . . . . . . . . . . . . . . . . . . . . . . . . 37

4.2 Physics-based algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.2.1 Gravitational search algorithm . . . . . . . . . . . . . . . . . . . . . . 38

4.3 Bio-inspired algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414.3.1 Genetic algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.3.1.1 Evolutionary algorithms . . . . . . . . . . . . . . . . . . . . . 414.3.1.2 Binary representation . . . . . . . . . . . . . . . . . . . . . . 424.3.1.3 Floating-point representation . . . . . . . . . . . . . . . . . . 43

4.3.2 Differential evolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 474.3.3 Flower pollination algorithm . . . . . . . . . . . . . . . . . . . . . . . 49

4.4 Swarm intelligence based algorithms . . . . . . . . . . . . . . . . . . . . . . . 514.4.1 Particle swarm optimisation . . . . . . . . . . . . . . . . . . . . . . . . 514.4.2 Accelerated particle swarm optimisation . . . . . . . . . . . . . . . . . 524.4.3 Firefly algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544.4.4 Bat algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 574.4.5 Cuckoo search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

5 Constraints 625.1 Unconstrained and constrained optimisation . . . . . . . . . . . . . . . . . . . 625.2 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.2.1 The method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 635.2.2 Equality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . 645.2.3 Inequality constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . 665.2.4 Equality and inequality constraints . . . . . . . . . . . . . . . . . . . . 685.2.5 Box constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.3 Penalty function method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 705.4 Barrier method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Variational calculus 726.1 Functional and its variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

6.1.1 Necessary condition for an extremum . . . . . . . . . . . . . . . . . . . 726.1.2 The Euler equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 736.1.3 Constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6.2 Classic problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 756.2.1 Shortest path on a plane . . . . . . . . . . . . . . . . . . . . . . . . . . 756.2.2 Brachistochrone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 766.2.3 Minimal surface of revolution . . . . . . . . . . . . . . . . . . . . . . . 786.2.4 Isoperimetric problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 796.2.5 Geodesics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.2.6 Minimal surface passing through a closed curve in space . . . . . . . . 836.2.7 Variational formulation of elliptic partial differential equations . . . . 84

Contents 5

6.3 Variational method of finding streamlines in ring cascades for creeping flows . 85

6.3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.3.2 Conservation equation in curvilinear coordinate systems . . . . . . . . 85

6.3.3 Dissipation function and dissipation power . . . . . . . . . . . . . . . 86

6.3.4 Analytical solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.3.5 Dissipation functional . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.3.6 Dissipation functional vs. equations of motion . . . . . . . . . . . . . 88

6.3.7 Streamlines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.3.7.1 Both ends constrained . . . . . . . . . . . . . . . . . . . . . . 89

6.3.7.2 One end partly constrained . . . . . . . . . . . . . . . . . . . 90

6.3.7.3 One end unconstrained . . . . . . . . . . . . . . . . . . . . . 91

6.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.4 Minimum drag shape bodies moving in inviscid fluid . . . . . . . . . . . . . . 92

6.4.1 Problem formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

6.4.2 Fluid Resistance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.4.2.1 Drag force . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.4.2.2 Pressure coefficients and its approximation . . . . . . . . . . 93

6.4.3 Two-dimensional problem . . . . . . . . . . . . . . . . . . . . . . . . . 94

6.4.4 Three-dimensional problem . . . . . . . . . . . . . . . . . . . . . . . . 95

6.4.4.1 Functional and Euler equation . . . . . . . . . . . . . . . . . 95

6.4.4.2 Exact pseudo solution . . . . . . . . . . . . . . . . . . . . . . 96

6.4.4.3 Approximate solution due to the functional . . . . . . . . . . 96

6.4.4.4 Approximate solution due to form of the function . . . . . . 96

6.4.4.5 Approximate solution by means of a Bezier curve . . . . . . 97

6.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

7 Multi-objective optimisation 100

7.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2 Domination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

7.2.1 The Pareto set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.2.2 The Pareto front . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.3 Scalarisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.3.1 Method of weighted-sum . . . . . . . . . . . . . . . . . . . . . . . . . . 101

7.3.2 Method of target vector . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.3.3 Method of minimax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.4 SPEA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

7.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

7.5.1 Two objective fitness functions of a single variable . . . . . . . . . . . 103

7.5.1.1 Analytical solution . . . . . . . . . . . . . . . . . . . . . . . . 104

7.5.1.2 Single objective reconstruction of Pareto set . . . . . . . . . 104

7.5.1.3 Multi-objective SPEA . . . . . . . . . . . . . . . . . . . . . . 104

7.5.2 Two objective fitness functions of two variables . . . . . . . . . . . . . 104

7.5.2.1 Analytical solution . . . . . . . . . . . . . . . . . . . . . . . . 106

7.5.2.2 Single objective reconstruction of Pareto set . . . . . . . . . 107

7.5.2.3 Multi-objective SPEA . . . . . . . . . . . . . . . . . . . . . . 107

7.6 Multi-objective description of Murray’s law . . . . . . . . . . . . . . . . . . . 109

7.6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

7.6.2 Multi-objective description . . . . . . . . . . . . . . . . . . . . . . . . 110

6 Contents

8 Statistical analysis 1138.1 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1138.2 Discrepancy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1158.3 Single-problem statistical analysis . . . . . . . . . . . . . . . . . . . . . . . . . 1158.4 Multiple-problem statistical analysis . . . . . . . . . . . . . . . . . . . . . . . 173

8.4.1 D = 2, 100 evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1748.4.2 D = 2, 400 evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . . 1758.4.3 D = 2, 2000 evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . 1768.4.4 D = 10, 104 evaluations . . . . . . . . . . . . . . . . . . . . . . . . . . 177

Bibliography 178

A Codes 181A.1 Single-point, derivative-free algorithms . . . . . . . . . . . . . . . . . . . . . . 181A.2 Multi-point, derivative-free algorithms . . . . . . . . . . . . . . . . . . . . . . 184A.3 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193

B AGA – Advanced Genetic Algorithm 194B.1 Brief introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194B.2 Detailed introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195B.3 I/O Console . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203B.4 Script writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208

Chapter 1

Introduction

1.1 Standard problem formulation

If the objective function to be minimised is f : RD → R then the standard (uncon-strained) optimisation problem is

minxf(x) = f0 (1.1)

where a D-dimensional point is x = (x1, x2, . . . , xD). What is more, x ∈ RD is alsoreferred to as an independent variable. As a consequence, the general problem ofunconstrained optimisation is the process of optimising (minimising or maximising)an objective function f in the absence of constraints on independent variable. Theobjective function may, however, be subjected to equality and inequality constraints

gi(x) = 0, (1.2a)

hj(x) ≤ 0. (1.2b)

Hence, the constrained optimisation problem is the process of optimising an objectivefunction f in the presence of constraints on independent variable x. One can dealwith a maximisation problem by negating the objective function

maxx

f(x) = minx

(−f(x)) . (1.3)

The argument x0 of the minimum value of the objective function f is expressedas

x0 = arg minxf(x) (1.4)

so that f0 = f(x0). The arg min operator (1.4) is defined by

arg minxf(x) := x : ∀yf(y) ≥ f(x) (1.5)

and gives a point x0 ∈ RD or a set of points whereas min operator (1.1) gives theminimum value f0 ∈ R

minxf(x) := f(x) : ∀yf(y) ≥ f(x) . (1.6)

8 1. Introduction

1.2 Global and local minima

Firstly, let us introduced the so called neighbourhood of a point x0 with radius r > 0,namely

B(x0, r) := x : 0 < ‖x− x0‖ < r . (1.7)

Consequently, a local minimum x0 is defined as a point for which

x0 = arg minx∈B(x0,r)

f(x). (1.8)

In other words, a point x0 is a local minimum of the objective function f if f(x0) ≤f(x) for all x fulfilling 0 < ‖x− x0‖ < r.

A global minimum g is defined as a point for which

g = arg minx∈Ω⊆RD

f(x) (1.9)

or a point x0 is a local minimum of the objective function f if f(x0) ≤ f(x) for allx. If Ω = RD one deals with unconstrained optimisation. On the other hand, ifΩ ⊂ RD then it is constrained problem and Ω is a feasible region or simply search(optimisation) space.

1.3 Feasibility problem

If there is no objective function f : RD → R to be minimised or when the objectivefunction values are the same for all x ∈ Ω, then the optimisation problem is called afeasibility problem. That is to say, any feasible point x ∈ Ω is an optimal solution.The feasibility problem is also referred to as the satisfiability problem.

The feasible region Ω is a set of points that satisfies all constraints (discussed inchapter 5), namely equality and inequality constraints

gi(x) = 0, (1.10a)

hj(x) ≤ 0 (1.10b)

or

Ω = x : gi(x) = 0, hj(x) ≤ 0 . (1.11)

1.4 Example

Let us consider the following two-dimensional objective function or the so called spherefunction

f(x) :=

2∑i=1

x2i . (1.12)

For the sake of simplicity, we assume first a discrete search domain Ω = x1,x2,x3where x1 = (1, 2), x2 = (3, 1), x3 = (1, 0). Consequently, according to equation

1.5. Classification of optimisation problems 9

(1.12), values of the objective function are f(x1) = 5, f(x2) = 10, f(x3) = 1. Thusthe best solution (the argument of the minimum value of f) is

g = arg minxj∈Ω

f(xj) = arg minf(1, 2), f(3, 1), f(1, 0) = (1, 0) (1.13)

and the minimum value f0 = f(g)

f0 = minxj∈Ω

f(xj) = minf(1, 2), f(3, 1), f(1, 0) = min5, 10, 1 = 1. (1.14)

Next, let us consider a continuous search domain Ω = R2, meaning that ourproblem is unconstrained. A two-dimensional plot of equation (1.12) is shown infigure 2.2. Obviously, the argument of the minimum value of f is

g = arg minx∈R2

f(x) = (0, 0) (1.15)

and the minimum value f0 = f(g) = f(0, 0)

f0 = minx∈R2

f(x) = f(0, 0) = 0. (1.16)

1.5 Classification of optimisation problems

Generally speaking, various optimisation problems can be loosely classified as followbased on:• Objective function

– Single objective. In the case of single objective optimisation we deal withonly one objective function. Most of the presented problems are singleobjective.

– Multi-objective. More than one objective function is simultaneously min-imised. Importantly, the considered objective functions should be in con-flict. Typically, multi-objective optimisation gives as a results set of solu-tions. Chapter 7 deals with multi-objective optimisation.

• Modality– Unimodal. A problem (function) is unimodal if there is only one local min-

imum. Single-point, derivative-based algorithms (chapter 2) are particu-larly suitable for such problems. Figure 1.2 shows an example of unimodalfunction.

– Multimodal. A problem (function) is multiunimodal if there are morethan one local minima. This problem is not suitable for derivative basedalgorithms. Derivative-free algorithms (chapter 3 and 4) are able to dealwith multimodal functions better. Figure 1.1 shows a multimodal function.

• Linearity– Linear. The objective function is linear together with constraints, if any.– Nonlinear. The objective function or the constraints are nonlinear, if any

or both of them. All of the presented algorithms are suitable for nonlinearfunctions.

• Variable type

10 1. Introduction

– Continuous. The optimisation variables are continuous (continuous sets ofreal numbers). All of the presented algorithms are suitable for continuousvariables.

– Discrete. The optimisation variables are discrete (integer numbers).– Mixed. Combination of the two above. For instance, one variable is con-

tinuous and the second discrete.• Constraints

– Constrained. The process of minimising the objective function f in thepresence of constraints on independent variable. We can distinguish equal-ity and inequality constraints. Figures 5.1, 5.2 and 5.3 display examples ofconstrained functions.

– Unconstrained. The process of minimising the objective function f in theabsence of constraints on independent variable.

1.6 Classification of algorithms

Optimisation algorithms can be divided based on:

• Derivative– Derivative-based. Derivative-based algorithms require first or second deriva-

tives of the objective functions. Ideally, the objective function should betwice differentiable. Derivative-based algorithms (chapter 2) are regardedas classical optimisation algorithms suitable for unimodal problems.

– Derivative-free. Derivative-free algorithms do not require derivatives of theobjective functions. Moreover, the objective function does not have to becontinuous.

• Point– Single-point. Single-point algorithms (chapter 2 and 3) process a single

point iteratively, constantly modifying and improving it.– Multi-point∗ Sequential. Algorithms process single point sequentially. Typically,

there is no exchange of information.∗ Parallel. Algorithms process many points in parallel in order to com-

municate and exchange information (chapter 4).• Randomness

– Deterministic. Algorithms comprise only known parameters. There is nouncertainty and randomisation.

– Stochastic. Randomisation through stochastic variables is introduced inorder to efficiently explore the feasible region.

– Hybrid. Combination of the two above.• ‘Globality’

– Local. Derivative-based algorithms are typically local optimisation algo-rithms (chapter 2) unless the objective function is unimodal.

– Global. Single-point (chapter 3) and multi-point, derivative free algorithms(chapter 4) are considered as global optimisation algorithms.

1.7. Hyperoptimisation 11

1.7 Hyperoptimisation

Hyperoptimisation or metaoptimisation is regarded as optimisation of optimisationalgorithms. It is also referred as tuning. Parameter tuning may be relevant in orderto improve the performance of stochastic methods in terms of minimisation of thenumber of iteration, for instance. It is obvious that poor set of parameters candecrease the performance of an algorithm. Ideally, the properly tuned algorithmsshould be able to solve the whole variety of different problems, or at least a given setof problems, with very good performance.

What is important is the performance measure being utilised during the tun-ing. The obvious choice, however, not the only one, is the number of iteration of atuned algorithm or more generally a computational cost. Hyperoptimisation is byno means a trivial problem. At least two approaches to this problem are considered[4]: configuring an algorithm by choosing optimal parameter and analysing an algo-rithm by studying how its performance depends on its parameters. Also two types ofparameters are considered, i.e., qualitative (e.g. type of binary vs floating-point rep-resentation) and quantitative (e.g. values of crossover probability), to make the wholeproblem even more complicated. Except for parameter tuning, discussed above, thereis also the so called parameter control problem when parameters undergo changeswhile algorithm is running.

1.8 Test functions

Two simple functions are introduce here in order to evaluate graphically character-istics of discussed algorithms. More complicated test functions, typically used asbenchmarks, are discussed in chapter 8. These include, among others, unimodal,multimodal, composition, separable and non-separable functions.

0 π4 π

2 3π4 π 0

π4

π2

3π4

π

0

−3

−6

xy

z

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 1.1: Multimodal test function

12 1. Introduction

1.8.1 Multimodal test function

The multimodal test function given by

f(x, y) := −5 sinx sin y − sin 7x sin 7y (1.17)

is a two-dimensional, nonlinear function. It is shown in figure 1.1. There are severallocal minima. The search space Ω is

Ω =

(x, y) : (x, y) ∈ [0;π]2. (1.18)

It is also regarded as box constraint set. The global minimum value of the function(1.17) is

min(x,y)∈Ω

f(x, y) = −6. (1.19)

The argument of the minimum value −6 of the function (1.17) is located at the centreof the search space Ω

arg min(x,y)∈Ω

f(x, y) =(π

2,π

2

), (1.20)

The multimodal function (1.17) is utilised in order to evaluate graphically character-istics of derivative-free algorithms.

0 π4 π

2 3π4 π 0

π4

π2

3π4

π

0

10

20

xy

z

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 1.2: Unimodal test function

1.8.2 Unimodal test function

The unimodal test function given by

f(x, y) := (x− 1)4 + (y − 1)4 (1.21)

is a two-dimensional, nonlinear function. It is shown in figure 1.2. There is only oneglobal minimum. The search space Ω (box constraint set) is

Ω =

(x, y) : (x, y) ∈ [0;π]2. (1.22)

1.9. Products 13

The global minimum value of the function (1.21) is

min(x,y)∈Ω

f(x, y) = 0 (1.23)

and the argument of the minimum value 0 of the function (1.21) is located at

arg min(x,y)∈Ω

f(x, y) = (1, 1). (1.24)

The unimodal function (1.21) is utilised in order to evaluate graphically characteristicsof derivative-based algorithms.

1.9 Products

There are several products of two vectors commonly met in optimisation. Theseinclude:

• Dot product. The dot product, denoted as ·, of two vectors x, y of the samesize is a scalar. It is defined as

x · y =

D∑i=1

xiyi. (1.25)

The dot product is also referred to as the inner or scalar product. What is more,the dot product is commutative, meaning that x · y = y · x.

• Dyadic product. The dyadic product, denoted with no multiplication signs, oftwo vectors x, y of the same size is a matrix. It is defined as

xy = (xiyj) =

x1y1 x1y2 . . . x1yDx1y2 x2y2 . . . x2yD

......

. . ....

x1yD x2yD . . . xDyD

. (1.26)

The dyadic product is also referred to as the outer or tensor product. If thefirst vector is an operator, such as gradient ∇, or vectors are not of the samesize then the dyadic product is not commutative.

• Hadamard product. The Hadamard product, denoted as , of two vectors x, yof the same size is a vector where each element is the product of elements of thecreating two vectors. It is defined as

x y = (xiyi) = (x1y1, . . . , xDyD) . (1.27)

The Hadamard product is also referred to as the entrywise product. Conse-quently, Hadamard product is commutative, i.e., x y = y x.

Chapter 2

Single-point, derivative-basedalgorithms

2.1 Introduction

2.1.1 Classification

Single-point, derivative-based algorithms can be divided into three main groups basedon information about derivatives necessary in order to find a minimum of the objectivefunction, namely:• Newton’s and modified Newton’s method• Method of steepest descent• Quasi-Newton methods

– Secant method– Other methods (DFP, BFGS)

• Conjugate gradient methodNewton’s method and its modified version use the gradient (first derivatives) and

the Hessian matrix (second derivatives) of the objective function while method ofsteepest descent does not. Other quasi-Newton methods try to approximate theHessian matrix and can be regarded as a certain generalisation of the secant method.

2.1.2 Gradient and Hessian of a function

2.1.2.1 Gradient

For differentiable and scalar functions of several variables f : RD → R the gradient isthe vector whose components consists of partial derivatives of f

∇f :=

(∂f

∂x1, . . . ,

∂f

∂xD

)=

(∂f

∂xi

). (2.1)

The gradient can be also regarded as a vector field pointing the direction in which thefunction f displays the largest rate of increase. Apart from direction, the magnitudeof the gradient ‖∇f‖ determines rate of change towards that direction.

2.1. Introduction 15

If the gradient cannot be determined analytically, the finite difference approxima-tion of the first order partial derivatives are used instead. The central difference of∂f∂xi

, being second order accurate, is then

∂f

∂xi≈ f(. . . , xi + h, . . .)− f(. . . , xi − h, . . .)

2h(2.2)

where h is a small, fixed differentiation step size. Alternatively, relative step size εcan be assumed, resulting in

h =

ε‖x‖ if ‖x‖ > ε,

ε if ‖x‖ ≤ ε. (2.3)

If the function f : R2 → R is two-dimensional then the central differences approxima-tions simplify to

∂f

∂x≈ f(x+ h, y)− f(x− h, y)

2h, (2.4a)

∂f

∂y≈ f(x, y + h)− f(x, y − h)

2h. (2.4b)

This is, however, true for the same step size towards x and y directions.

2.1.2.2 Hessian

For twice differentiable scalar functions of several variables f : RD → R the Hessianmatrix is a square matrix whose components consists of second order partial deriva-tives of f . Provided that the second order derivatives are continuous, the Hessianmatrix is symmetric

H :=

(∂2f

∂xi∂xj

)=

∂2f∂x2

1

∂2f∂x1∂x2

. . . ∂2f∂x1∂xD

∂2f∂x1∂x2

∂2f∂x2

2. . . ∂2f

∂x2∂xD...

.... . .

...∂2f

∂x1∂xD

∂2f∂x2∂xD

. . . ∂2f∂x2D

. (2.5)

The finite difference approximation of the second order partial derivatives can be

used in order to evaluate the Hessian matrix. The symmetric difference of ∂2f∂x2i, being

second order accurate, is

∂2f

∂x2i

≈ f(. . . , xi + h, . . .)− f(. . . , xi, . . .) + f(. . . , xi − h, . . .)h2

(2.6)

and of mixed derivatives ∂2f∂xi∂xj

respectively

∂2f

∂xi∂xj≈ f(. . . , xi + h, xj + h, . . .) + f(. . . , xi − h, xj − h, . . .)

4h2

− f(. . . , xi − h, xj + h, . . .)− f(. . . , xi − h, xj + h, . . .)

4h2. (2.7)

16 2. Single-point, derivative-based algorithms

If the function f : R2 → R is two-dimensional then the symmetric differences approx-imations simplify to

∂2f

∂x2≈ f(x+ h, y)− f(x, y) + f(x− h, y)

h2, (2.8a)

∂2f

∂y2≈ f(x, y + h)− f(x, y) + f(x, y − h)

h2, (2.8b)

∂2f

∂x∂y≈ f(x+ h, y + h) + f(x− h, y − h)− f(x− h, y + h)− f(x+ h, y − h)

4h2.

(2.8c)

Again, this is true for the same step size towards x and y directions.

2.2 Newton’s method

The idea behind Newton’s method is to approximate f by a quadratic function aroundx0 at each iteration. Subsequently, an attempt to minimise that approximation isundertaken.

Let us consider one-dimensional function f : R → R first. Assuming that f hascontinuous derivatives over certain interval, the Taylor expansion is used

f(x0 + ∆x) =

m−1∑n=0

dnf(x0)

n!+

dmf(c)

m!(2.9)

where x = x0 + ∆x, c = x0 + θ∆x and θ ∈]0; 1[. The above equation may also bewritten as

f(x) = f(x0) + f ′(x0)∆x+ 12f′′(x0)∆x2 + 1

6f′′′(c)∆x3. (2.10)

The third derivative is evaluated at the unknown point c. Discarding (truncating)the last term one gets a quadratic approximation to f

f(x) ≈ f(x0) + f ′(x0)∆x+ 12f′′(x0)∆x2. (2.11)

A necessary conditions for optimality of f is f ′(x) = 0. Differentiating the aboveequation with respect to x or ∆x = x − x0 and taking advantage of the necessarycondition, one gets

0 = f ′(x0) + f ′′(x0)∆x. (2.12)

Solving the above for ∆x it is possible to provide the following equation

∆x = − f′(x0)

f ′′(x0). (2.13)

Finally, an iterative sequence can now be constructed in order to get better approxi-mation xn+1 to the equation f ′(x) = 0

xn+1 = xn −f ′(xn)

f ′′(xn). (2.14)

2.3. Modified Newton’s method 17

Following the same line of reasoning to functions of several variables f : RD → Rwe have an equivalent of equation (2.11)

f(x) ≈ f(x0) +∇f(x0) ·∆x + 12∆x ·H(x0) ·∆x. (2.15)

A necessary conditions for optimality of f is now ∇f(x) = 0 or ∇f(∆x) = 0. Dif-ferentiating equation (2.15) with respect to x or ∆x and taking advantage of thenecessary condition, we have an equivalent of equation (2.12)

0 = ∇f(x0) + H(x0) ·∆x. (2.16)

It is possible now to solve the above equation for ∆x

∆x = −H−1(x0) · ∇f(x0) (2.17)

and provide the following iterative scheme equivalent to (2.14)

xn+1 = xn −H−1(xn) · ∇f(xn). (2.18)

The structure of Newton’s method is shown in listing 2.1. The algorithms stops when‖xn+1 − xn‖ ≤ εmax, i.e., the difference between the previous and current solution isbelow an assumed accuracy ε or maximum number of iterations nmax is reached.

Input: αn, nmax, εmax, x0

Output: x0

1 n := 0;2 repeat3 xn+1 := xn −H−1(xn) · ∇f(xn);4 ε := ‖xn+1 − xn‖;5 n := n+ 1;

6 until n < nmax and ε ≥ εmax;7 x0 := xn−1;

Algorithm 2.1: Newton’s method pseudocode

Newton’s method does not only take advantage of the maximal direction of changeas method of steepest descent does, discussed further. It also corrects search directionby weighting gradients with the Hessian matrix inverse. This means that it directsthe search towards to the minimum rather than towards maximal direction of change.Furthermore, this is possible because of second order derivatives. There is, however,a drawback of Newton’s method, namely the cost of additional function evaluations.Furthermore, this method converges for initial points close to the optimal value. Whatis more, the Hessian matrix H has to be positive definite otherwise the method canbe divergent.

2.3 Modified Newton’s method

One possible approach to generalising Newton’s method is the relaxation factor αkwhich can control the step size

xn+1 = xn − αnH−1(xn) · ∇f(xn). (2.19)


The value of relaxation factor can be determined by the solution of one dimensionaloptimisation problem

αn = arg minαf(xn − αH−1 · ∇f(xn)). (2.20)

The one-dimensional equivalent of equation (2.19) is

xn+1 = xn − αnf ′(xn)

f ′′(xn). (2.21)

For αn := 1 we have equation (2.14). The relaxation factor αk can be either constantα ∈]0; 1] or adjustable. The structure of modified Newton’s method is shown in listing2.2.

Input: αn, nmax, εmax, x0

Output: x0

1 n := 0;2 repeat3 αn = arg minα f(xn − αH−1 · ∇f(xn));4 xn+1 := xn − αnH−1(xn) · ∇f(xn);5 ε := ‖xn+1 − xn‖;6 n := n+ 1;


Algorithm 2.2: Modified Newton’s method pseudocode

2.4 Method of steepest descent

Method of steepest descent is also know as method of gradient descent. Steepestdescent method directs the search towards maximal direction of change, i.e., towardsthe direction of the negative gradient. Therefore, it is enough to set H(xn) := δ inequation (2.19)

xn+1 = xn − αn∇f(xn). (2.22)

Following the same logic f ′′(xn) := 1 it is possible to obtain the one-dimensionalequivalent of the above equation

xn+1 = xn − αnf ′(xn). (2.23)

As previously, the step size αk can be either constant α ∈]0; 1] or adjustable. Theactual value of it can be determined by the solution of one-dimensional optimisationproblem

αn = arg minαf(xn − α∇f(xn)). (2.24)

The structure of steepest descent method is shown in listing 2.3 for adjustable stepsize αn. For constant α it is enough to replace line 3 with αn := α.

2.4. Method of steepest descent 19

Input: nmax, εmax, x0

Output: x0

1 n := 0;2 repeat3 αn := arg minα f(xn − α∇f(xn));4 xn+1 := xn − αn∇f(xn);5 ε := ‖xn+1 − xn‖;6 n := n+ 1;


Algorithm 2.3: Steepest descent method pseudocode

Method of steepest descent may have a poor tendency near the optimal value,since the closer to the minimum the smallest the gradients or step sizes become. Thisis especially true for constant step size α because there are no additional informationin order to correct the direction and the step size of the next iteration.

Figure 2.1 displays 29 evaluations of Newton’s method and 200 iterations of steep-est descent with constant α = 0.015. For the latter approach it was not possible toreach the optimal value. However, only 10 iterations of steepest descent with ad-justable αn according to equation (2.24) was necessary to reach the optimal valuewithin the εmax := 10−5 accuracy. As for the modified Newton’s method with ad-justable αn according to equation (2.20) only 2 iterations are necessary.

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Newton

Steepest descent α

Steepest descent αn

Figure 2.1: Newton’s method vs steepest descent


2.5 Quasi-Newton methods

2.5.1 Secant method

The secant method for one-dimensional optimisation approximates the second deriva-tive of Newton’s equation (2.14) by means of the first order accurate backward finitedifference

f ′′(xn) ≈ f ′(xn)− f ′(xn−1)

xn − xn−1. (2.25)

By that means, an iterative sequence has the following form

xn+1 = xn − f ′(xn)xn − xn−1

f ′(xn)− f ′(xn−1). (2.26)

It has to be noted that two starting values of f ′ (i.e. f ′(xn) and f ′(xn−1)) areneeded in comparison with Newton’s method equation (2.14). Given that one canstore previously evaluated f ′(xn−1), it can hardly be regarded as a drawback of thismethod.

2.5.2 Other methods

Many quasi-Newton methods and optimisation method in general consist of two steps,namely a direction dn formulation with the step size αn and the following updateformula

xn+1 := xn + αndn. (2.27)

In other words, a sequence of points (xn)∞n=0 is created, hopefully leading to anoptimal value. The infinite sequence is truncated if ‖xn+1 − xn‖ ≤ εmax, i.e., thetwo subsequent points are close enough. Ideally, an initial point x0 should be locatedclose to the optimal value in order to ensure convergence.

Assuming dn := −∇f(xn) in equation (2.27) it is possible to obtain steepestdescent method equation (2.22). If

dn := −H−1(xn) · ∇f(xn) (2.28)

then one receives modified Newton’s method according to equation (2.19), which isthe starting point for quasi-Newton methods. These methods make an attempt toapproximate the inverse of Hessian matrix that is now denoted as Mn. Thus thedirection is now

dn := −Mn · ∇f(xn). (2.29)

Furthermore, the adjustable step size αn is determined by the solution of one-dimensionaloptimisation

αn = arg minαf(xn + αdn). (2.30)

The structure of general quasi-Newton algorithm is shown in listing 2.4.Quasi-Newton methods do not take advantage of the explicit use of the Hesian

matrix Hn or its inverse. The subsequent approximation of H−1n is used by means

2.6. Conjugate gradient method 21

Input: nmax, εmax, x0, M0

Output: x0

1 n := 0;2 repeat3 dn := −Mn · ∇f(xn);4 αn := arg minα f(xn − αdn);5 xn+1 := xn − αndn;6 Calculate ∇f(xn+1);7 Update Mn+1;8 ε := ‖xn+1 − xn‖;9 n := n+ 1;


Algorithm 2.4: General quasi-Newton method pseudocode

of Mn instead. Typically, M0 := δ. The DFP method (Davidon-Fletcher-Powell)updates Mn by means of the following equation

Mn+1 = Mn +∆xn∆xn∆xn ·wn

− (Mn ·wn) (wn ·Mn)

wn ·Mn ·wn(2.31)

where ∆xn := xn+1 − xn and wn := ∇f(xn+1) − ∇f(xn). The above update usessubsequents gradients. The same concerns the BFGS method (Broyden-Fletcher-Goldfarb-Shanno). This time Mn is updated by

Mn+1 =

(δ− ∆xnwn

∆xn ·wn

)·Mn ·

(δ− wn∆xn

∆xn ·wn

)+

∆xn∆xn∆xn ·wn

. (2.32)

2.6 Conjugate gradient method

The first step of the conjugate gradient method (listing 2.5) is simply steepest descentmethod, i.e. d0 := −∇f(x0). The adjustable step size αn in update formula (2.27) iscalculated according to one-dimensional optimisation in equation (2.24). Subsequentiterations include additional term in update formula, namely βndn. Both, gradientdescent direction and the additional term are referred as the conjugate direction dn+1

dn+1 := −∇f(xn+1) + βndn. (2.33)

Similarly to the first step, the adjustable step size αn is obtained as a result of one-dimensional optimisation

α := arg minαf (xn + α (−∇f(xn+1) + βndn)) . (2.34)

The most popular choice of βn is due to Fletcher and Reeves

βn :=∇f(xn+1) · ∇f(xn+1)

∇f(xn) · ∇f(xn). (2.35)


Input: nmax, εmax, x0

Output: x0

1 n := 0;2 dn := −∇f(xn);3 repeat4 α := arg minα f(xn + αdn);5 xn+1 := xn + αndn;6 Calculate βn;7 dn+1 := −∇f(xn+1) + βndn;8 ε := ‖xn+1 − xn‖;9 n := n+ 1;


Algorithm 2.5: Conjugate gradient method pseudocode

2.7 Conditions for optimality

A necessary conditions for optimality of twice continuously differentiable functionf : RD → R in unconstrained optimisation problems is

∇f(x0) = 0. (2.36)

Point x0 or points, if any, are called stationary points or critical points. The necessarycondition (2.36) results in a set of typically nonlinear algebraic equations.

Sufficient conditions for optimality of f : RD → R in unconstrained optimisationproblems need to examine the Hassian matrix at stationary points

H(x0) :=

(∂2f(x0)

∂xi∂xj

). (2.37)

This is because at stationary points we can localise a minimum, maximum or a neitherof those. To be more precise, the eigenvalues of the Hessian matrix at the stationarypoint need to be examined. The determinant

|H(x0)− λδ| = 0 (2.38)

results in characteristic polynomial with D roots (eigenvalues) λi. At x0 we have

• minimum if H(x0) is positive definite (all λi > 0)• minimum or saddle point if H(x0) is positive semi-definite (all λi ≥ 0 and at

least one λi = 0)• maximum if H(x0) is negative definite (all λi < 0)• maximum or saddle point if H(x0) is negative semi-definite (all λi ≤ 0 and at

least one λi = 0)• saddle point. If H(x0) is indefinite (certain λi > 0 and certain λi < 0)

Alternatively, the Hessian matrix is positive definite if all the subdeterminants

2.7. Conditions for optimality 23

(principal minors)

Hn(x0) :=

∣∣∣∣∣∣∣∣∣∣∣

∂2f(x0)∂x2

1

∂2f(x0)∂x1∂x2

. . . ∂2f(x0)∂x1∂xn

∂2f(x0)∂x1∂x2

∂2f(x0)∂x2

2. . . ∂2f(x0)

∂x2∂xn...

.... . .

...∂2f(x0)∂x1∂xn

∂2f(x0)∂x2∂xn

. . . ∂2f(x0)∂x2n

∣∣∣∣∣∣∣∣∣∣∣(2.39)

for n ∈ 1, . . . , D are positive, i.e.

∀n∈1,...,DHn(x0) > 0. (2.40)

However, the above criterion cannot be used in order to verify whether the Hessianmatrix if positive semi-definite.

−2 −10

12 −2

−1

01

20

4

8

xy

z

−2 −1 0 1 2−2

−1

0

1

2

x

y−2 −1 0 1 2

−2

−1

0

1

2

Figure 2.2: f(x, y) := x2 + y2 plot

The two succeeding examples are provided in order to illustrate the condition foroptimality. Let us consider first a two-dimensional function f : R2 → R given by thefollowing equation

f(x, y) := x2 + y2. (2.41)

The necessary conditions for optimality ∇f = 0 results in a stationary point (0, 0).Now, the Hessian matrix is

H :=

(∂2f∂x2

∂2f∂x∂y

∂2f∂x∂y

∂2f∂y2

)=

(2 00 2

). (2.42)

The next step is to examine the eigenvalues of the Hessian matrix. This leads to thedeterminant

|H(x0)− λδ| = 0 (2.43)

or ∣∣∣∣2− λ 00 2− λ

∣∣∣∣ = 0 (2.44)


resulting in characteristic polynomial (2 − λ)2 = 0. The solutions are λ1 = λ2 = 2,i.e. H(0, 0) is positive definite. There is a local minimum at (0, 0), see figure 2.2.

Let us now consider a two-dimensional function f : R2 → R given by the followingequation

f(x, y) := x2 − y2. (2.45)

The necessary conditions for optimality ∇f = 0 results in exactly the same stationarypoint (0, 0) as previously. However, the Hessian matrix is different

H :=

(∂2f∂x2

∂2f∂x∂y

∂2f∂x∂y

∂2f∂y2

)=

(2 00 −2

). (2.46)

Examining the eigenvalues of the Hessian matrix, we have the determinant

|H(x0)− λδ| = 0 (2.47)

or ∣∣∣∣2− λ 00 −2− λ

∣∣∣∣ = 0 (2.48)

resulting in characteristic polynomial λ2 − 4 = 0. The solutions are λ = ±2, i.e.H(0, 0) is indefinite. There is a saddle point at (0, 0), see figure 2.3.

−2 −10

12 −2

−1

01

2

−4

0

4

xy

z

−2 −1 0 1 2−2

−1

0

1

2

x

y

−2 −1 0 1 2

−2

−1

0

1

2

Figure 2.3: f(x, y) := x2 − y2 plot

Chapter 3

Single-point, derivative-freealgorithms

3.1 Random variables and stochastic processes

3.1.1 Selected random variables

A random variable X is a function X : Ω→ R from the set of elementary events Ω tothe set of real numbers R provided that a set ω ∈ Ω : X(ω) < x is an elementaryevent. By a random variate one understand the realisations of a random variable,i.e., random outcomes according to a probability distribution function of the randomvariable. The set of realisation X(Ω) := X(ω) : ω ∈ Ω is called a set of values of thevariable X. There are two types of random variables, namely discrete and continuous.The former takes finite or countable list of values associated with probability massfunction whereas the latter takes any numerical value associated with probabilitydistribution function.

3.1.1.1 Discrete uniform distribution

The discrete uniform distribution is given in table 3.1. The finite number n of values xiare equally probable with probability 1

n . Furthermore, the probability mass functionfor n = 5 is shown in figure 3.1 which is also referred as a histogram.

Table 3.1: Discrete uniform distribution

xi x1 . . . xn

pi1n . . . 1

n

The expected value of the discrete uniform distribution is

EX =1

n

∑i

xi =: µ (3.1)

26 3. Single-point, derivative-free algorithms

whereas variance

D2X =1

n

n∑i=1

(xi − µ)2. (3.2)

Any particular realisation or simply random variate of the discrete uniform distri-bution is denoted as Ux1, xn. We have Ux1, xn ∈ x1, x2, . . . , xn with equalprobability 1

n .

1 2 3 4 50

0.2

0.4

0.6

0.8

1

xi

pi

1 2 3 4 5

0

0.2

0.4

0.6

0.8

1

Figure 3.1: Probability mass functionof a discrete uniform distribution

−6 −4 −2 0 2 4 6

0

0.2

0.4

0.6

0.8

1

−6 −4 −2 0 2 4 60

0.2

0.4

0.6

0.8

1

x

f(x)

a= 0; b=1

a=−b=2

a=−b=3

Figure 3.2: Probability density func-tion of continuous uniform distribu-tions

3.1.1.2 Continuous uniform distribution

The continuous uniform distribution is given by the following probability distributionfunction

f(x) :=

1b−a , a ≤ x ≤ b;0, otherwise

(3.3)

which is shown in figure 3.2 for various a and b. The expected value of the continuousuniform distribution is

EX =w +∞

−∞x f(x) dx =

a+ b

2=: µ (3.4)

and variance

D2X =w +∞

−∞(x− µ)2 f(x) dx =

1

12(b− a)

2. (3.5)

Any particular realisation or random variate of the continuous uniform distribution isdenoted as U(a, b) or for more than one dimension U(a, b). For a standard continuousuniform distribution, denoted as U(0, 1), we have EX = 1

2 and D2X = 112 .

3.1.1.3 Normal distribution

The normal uniform distribution is given by the following probability distributionfunction

f(x) :=1√2πσ

e−(x−µ)2

2σ2 (3.6)

3.1. Random variables and stochastic processes 27

which is shown in figure 3.3 for various σ and µ = 0. The expected value of thenormal distribution is EX = µ and variance D2X = σ2. Any particular realisationor random variate of the normal distribution is denoted as N (µ, σ2) or for more thanone dimension N (µ, σ2). The standard normal distribution is denoted as N (0, 1), forwhich the expected value EX = 0 and variance D2X = 1.

−6 −4 −2 0 2 4 60

0.2

0.4

0.6

x

f(x)

−6 −4 −2 0 2 4 6

0

0.2

0.4

0.6σ = 0.75

σ = 1

σ = 2

Figure 3.3: Probability density func-tion of normal distributions

−10 −5 0 5 100

0.1

0.2

0.3

0.4

xf(x)

−10 −5 0 5 10

0

0.1

0.2

0.3

0.4α = 1

α = 1.5

α = 2

Figure 3.4: Probability density func-tion of symmetrical Levy stable distri-butions

3.1.1.4 Levy alpha-stable distribution

The Levy alpha-stable distribution is the four parameters family of distributions.These are α – stability parameter, β – skewness parameter, µ – location parameter,γ – scale parameter. The probability distribution function f(x, α, β, µ, γ) can beexpressed analytically only for selected group of parameters. It is possible to providethe expected value EX = µ when µ > 1 and variance D2X = 2γ2 when α = 2.

When β = µ = 0 the Levy alpha-stable distribution is know as the symmetricalLevy stable distribution Lα,γ with the following probability distribution function

f(x, α, γ) :=1

π

∞w

0

e−γyα

cos yxdy. (3.7)

As the Levy distributions are difficult to deal with both analytically and numerically,the following approximation of Lα,γ can be used [16]

Lα,σ :=X

|Y | 1α(3.8)

where Y is a random variable with the standard normal distribution and X is arandom variable with the normal distribution with µ = 0 and the standard deviationσ given by

σα :=Γ(1 + α) sin πα

2

Γ( 1+α2 )α 2

α−12

. (3.9)


Any particular realisation or random variate of the symmetrical Levy stable distribu-tion is denoted as

L(α, σ) :=σN (0, 1)

|N (0, 1)| 1α. (3.10)

3.1.2 Selected stochastic processes

A real value function X : T × Ω → R is a random function provided that a setω ∈ Ω : X(t, ω) < x is an elementary event. For a fixed t the function X is arandom variable Xt sometimes denoted as X(t), Xt(ω) or even X(t, ω).

A stochastic process is a set of random variables Xt depending on one parameter,typically time t

Xt : t ∈ T . (3.11)

If a set T is countable, i.e. T := 1, 2, . . . then the stochastic process (3.11) can beregarded as a stochastic series (xn)n=1.

−0.5 0 0.5 1 1.5−0.5

0

0.5

1

x

y

−0.5 0 0.5 1 1.5

−0.5

0

0.5

1

Figure 3.5: Wiener process realisations

3.1.2.1 Wiener process

The Wiener process is an example of a continuous time stochastic process and ischaracterised by the following properties:• W (0) = 0 with probability one.• If 0 < t1 < t2 < t3 < t4 < τ then W (t2) − W (t1) and W (t4) − W (t3) are

independent.• If 0 < t1 < t2 < τ then W (t2) −W (t1) ∼ √t2 − t1N (0, 1), meaning that the

difference W (t2)−W (t1) is a random variable with the normal distribution withµ = 0 and variance t2 − t1, i.e. N (0, t2 − t1).

A method of increments summing is applied to discrete approximation of thecontinuous Wiener process, namely

dW =√

∆tN (0, 1) (3.12)

3.1. Random variables and stochastic processes 29

where ∆t = τ/nmax. In order to form a D-dimensional Wiener process, a limitedsequence of points is created (xn)nmaxn=0 , where

xn+1 := xn + α ε (3.13)

where the random vector ε is drawn from the standard normal distribution vector

ε := N (0, 1). (3.14)

The scale coefficient α, obviously, is

α :=√

∆t =

√τ

nmax. (3.15)

Figure 3.5 displays an example realisation of the Wiener process.

−1 0 1 2 3 4

0

1

2

3

4

5

x

y

−1 0 1 2 3 4

0

1

2

3

4

5

0 0.5 1 1.5 2

0

0.5

1

1.5

2

x

y

0 0.5 1 1.5 2

0

0.5

1

1.5

2

Figure 3.6: Levy flight realisations

3.1.2.2 Levy flight

The Levy flight or in fact the Levy alpha-stable walk is another example of a contin-uous time stochastic process and is characterised by the following properties:• X(0) = 0 with probability one.• If 0 < t1 < t2 < t3 < t4 < τ then X(t2) − X(t1) and X(t4) − X(t3) are

independent.• If 0 < t1 < t2 < τ then X(t2) −X(t1) ∼ (t2 − t1)1/αL(α, 1), meaning that the

difference X(t2)−X(t1) is a random variable with the symmetrical Levy stabledistribution with the scale parameter (t2 − t1)1/α, i.e. L(α, (t2 − t1)1/α).

As previously, a method of increments summing is applied to discrete approxima-tion of the Levy flight, i.e.

dX = ∆t1αL(α, 1). (3.16)

A limited sequence of points is created (xn)nmaxn=0 in order to form a D-dimensionalLevy flight

xn+1 := xn + αn ε. (3.17)


This time, however, the random vector ε is drawn from the symmetrical Levy stabledistributions

ε = L(α, σ) :=σN (0, 1)

|N (0, 1)| 1α. (3.18)

The scale coefficient αn, not to be confused this time with the stability parameter α,is

αn := ∆t1α =

(τ

nmax

) 1α

. (3.19)

Figure 3.6 (left side) displays an example realisation of the Levy flight. Long jumps aretypically parallel to either x or y axis. Simultaneous long jumps are hardy probable.In order to simulate such jumps, one can propose the following random vector

ε :=σN (0, 1)

|N (0, 1)| 1αε

‖ε‖ (3.20)

whereε := N (0, 1). (3.21)

However, the proposed random process shown in figure 3.6 (right side) is not a strictLevy flight.

3.2 Random walk

3.2.1 Uncontrolled random walk

A sequence of points (xn)nmaxn=0 is randomly generated in a similar manner to theWiener process, given by equation (3.13),

xn+1 := xn + α ε. (3.22)

The random vector is drawn from the standard normal distribution for every coordi-nate N (0, 1) and the step size α(Ui −Li) accounts for the search domain size. Lowerand upper domain constraints are denoted as Ui and Li respectively. Comparing theWiener process step size (3.15) with the uncontrolled random walk version one canobserve the difference. This is because it does not depend on the maximum stepnumber nmax but on the search domain size instead. In vector notation we have

ε := (U− L) N (0, 1). (3.23)

One possible form of the α constant could be

α :=1√

Dnmax. (3.24)

Furthermore, the whole step size α(Ui − Li) can be also regarded as the standarddeviation or the square root of variance of N (0, (α(Ui − Li))2). Finally, the updateformula is

xn+1 := xn +U− L√Dnmax

N (0, 1). (3.25)

3.2. Random walk 31

The algorithm is shown in listing 3.1. As there is no control whether the random walkis within the search domain the method is called the uncontrolled random walk. Thelast line of algorithm 3.1 stores the currents best point eventually becoming globalbest solution when a maximum number of evaluations is reached. Figure 3.7 displaysan example realisation of an uncontrolled random walk for nmax = 100.

Input: α, nmax, L, UOutput: g

1 g := x := L + (U− L) U(0, 1);2 for n := 1 to nmax − 1 do3 ε := (U− L) N (0, 1);4 x := x + αε;5 g := arg min f(g), f(x);

Algorithm 3.1: Uncontrolled random walk pseudocode

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 3.7: 100 evaluations of uncontrolled random walk

3.2.2 Domain controlled random walk

The domain controlled random walk is a natural extension of its uncontrolled version.In order to control whether the random walk is within the search domain Ω, nextsteps xn+1 are only accepted if xn+1 ∈ Ω

xn+1 :=

xn+1, if xn+1 ∈ Ω;

xn, if xn+1 /∈ Ω.(3.26)

This approach does not introduce additional function evaluations f(xn+1) as onlypositions are checked. Analogously, the update formula is given by equation (3.25).The algorithm is shown in listing 3.2.



1 g := x := L + (U− L) U(0, 1);2 for n := 1 to nmax − 1 do3 ε := (U− L) N (0, 1);4 while y /∈ Ω do5 y := x + αε;

6 x = y;7 g := arg min f(g), f(x);

Algorithm 3.2: Domain controlled random walk pseudocode

Figure 3.8 displays an example realisation of a domain controlled random walk fornmax = 100. This can be compared with figure 3.7.

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 3.8: 100 evaluations of domain controlled random walk

3.2.3 Position controlled random walk

A sequence of random points (xn)nmaxn=0 is generated in a different manner in compar-ison to the Wiener process (3.13) or uncontrolled random walk (3.22). First of all, atemporary point y is created

y := g + α ε. (3.27)

Then the next point xn+1 of a sequence is accepted only if the objective function f(y)is lower than of the predecessor f(xn)

xn+1 :=

y, if f(y) < f(g);

xn, otherwise.(3.28)

Thus, the predecessor is always regarded as a current global best g. Otherwise thepredecessor is preserved and the next new point is generated randomly. Moreover,

3.2. Random walk 33

the step constant α is given by the following equation

α :=1

10√D, (3.29)

being one among many possibilities. Ultimately, the update equation is now

xn+1 := g +U− L

10√DN (0, 1). (3.30)

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 3.9: 100 evaluations of position controlled random walk

The position controlled random walk is one of the simplest global optimisation,nature inspired algorithms. It is shown in listing 3.3.


1 g := x := L + (U− L) U(0, 1);2 for n := 1 to nmax − 1 do3 ε := (U− L) N (0, 1);4 x := g + αε;5 if f(x)− f(g) < 0 then g := x;

Algorithm 3.3: Position controlled random walk pseudocode

Figure 3.9 displays an example realisation of a position controlled random walk fornmax = 100. The solid polyline represents the sequence of points forming the optimi-sation path. Separate points depict probing the search domain, when the conditionf(y) < f(g) was not satisfied.


0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 3.10: 100 evaluations of simulated annealing

3.3 Simulated annealing

Simulated annealing [14] is somewhat similar to the position controlled random walkgiven by equation (3.27)

y := xn + α (U− L) N (0, 1). (3.31)

The only difference is that the predecessor xn does not have to be better than atemporary point y in terms of the objective function value. The function difference∆ is customarily used as an improvement indicator

∆ := f(y)− f(xn). (3.32)

In this way, the next point is always accept if ∆ < 0. Alternatively, it may also beaccepted if ∆ > 0 with certain probability p

xn+1 :=

y, if ∆ < 0 or p > U(0, 1);


The probability, however, has to fulfill at least two conditions. It should decrease asthe algorithm progresses. Furthermore, it should also decrease as ∆ increases. In orderto fulfill these conditions, the Boltzmann distribution is taken under consideration,or in fact the ratio of a Boltzmann distribution for two states,

p ∼ e−∆EkT (3.34)

as it describes distribution of particle energy differences ∆E over various states. Itis also loosely connected with the transition of a physical system or in this caseannealing, i.e. slow cooling of metals with temperature T . Slow cooling assumptionsallows for another simplification, namely, equilibrium state at all times which leads

3.3. Simulated annealing 35

to a minimum energy configurations of particles. Further, the Boltzmann’s constantis assumed to be k = a and energy difference

∆E ∼ a∆. (3.35)

Thus, the probability p of acceptance of worse solution is now given by the followingapproximation

p ∼ e−∆T (3.36)

and the next point may be accepted if ∆ > 0 and

e−∆T > U(0, 1). (3.37)

Proportion (3.36) fulfill the condition that p decrease as ∆ increases. In order to im-plement the remaining requirement, i.e. p should decrease as the algorithm progresses,it is necessary to introduce the so called cooling schedule

Tn+1 ≤ Tn. (3.38)

There are several possibilities, for instance

Tn+1 :=Tn − δ n, (3.39a)

Tn+1 :=Tnδ1/nmax , (3.39b)

Tn+1 :=T0δ (3.39c)

where δ is another constant of the algorithm together with the step size constant αand initial temperature T0. The cooling rate (3.39), controlled by constant δ, cannotbe too quick in order to avoid local minima or too slow because it becomes then toocostly.

Input: T , α, δ, nmax, L, UOutput: g

1 g := x := L + (U− L) U(0, 1);2 for n := 1 to nmax − 1 do3 T := T δ1/nmax ;4 ε := (U− L) N (0, 1);5 y := x + αε;6 ∆ = f(x)− f(y);

7 if ∆ < 0 or e−∆/T > U(0, 1) then x := y;8 g := arg min f(g), f(x);

Algorithm 3.4: Simulated annealing pseudocode

Simulated annealing is another example of global optimisation, nature inspiredalgorithms. It is shown in listing 3.4. Figure 3.10 displays an example realisationof simulated annealing for nmax = 100. The solid polyline represents the sequenceof points forming the optimisation path. Separate points depict probing the searchdomain, when neither ∆ < 0 nor e−

∆T > U(0, 1) was satisfied.


3.4 Random jumping

The random jumping is the simplest and most naive way of dealing with an objectivefunction optimisation. Simply, a sequence o completely random points is generatedwith no relation to one another whatsoever

xn+1 := α ε. (3.40)

The random vector ε can be drawn from the standard normal distribution accordingto equation (3.23), for instance, or any other distribution. Formula for generatingpoints is given by

xn+1 := α (U− L) U(0, 1). (3.41)

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 3.11: 100 evaluations of random jumping

Figure 3.11 displays an example plot of random jumping for nmax = 100. Thesolid polyline represents an order of points generation and it does not constitute anyoptimisation path. Despite the fact that the algorithm is simple and naive it may,however, perform better than an uncontrolled random walk.

Chapter 4

Multi-point, derivative-freealgorithms

4.1 Introduction

Metaheuristic and nature-inspired algorithms are two key concepts in global opti-misation. Also, nature-inspired metaheuristic is commonly used term. Actually, allalgorithms in this chapter can be classified as metaheuristic inspired by nature.

4.1.1 (Meta)heuristic

Heuristic is typically trial and error approach to problem solving or in other words it isa method developed on the basis of experience. Metaheuristic (higher lever heuristic)is not problem-specific, stochastic algorithms with randomisation and local search.Properties of (meta)heuristic are:

• There is no guarantee that a globally optimal solution can be found. This isbecause (meta)heuristic algorithms are approximate in nature.

• Sufficiently good solution can be found in a reasonable amount of time.• A balance between exploitation and exploration should exists. The former con-

cept (exploitation) has local search character, whereas the latter (exploration)is of global nature.

Moreover, most of the (meta)heuristic optimisation algorithms are global due to theirstochastic nature.

4.1.2 Nature-inspired algorithms

Metaheuristic, nature-inspired algorithms can be classified according to sources ofinspiration [7]:

• Physics-based. Inspiration comes from physics or chemistry. Certain laws areimitated. Examples of physics-inspired algorithm include for instance Gravita-tional Search Algorithm. Two more examples, known from the previous chapter,

38 4. Multi-point, derivative-free algorithms

are Random Walk and Simulated Annealing. However, these are also classifiedas single-point algorithms.

• Bio-inspired. Inspirations comes from biology. Examples of bio-inspired algo-rithms are Genetic Algorithms, Differential Evolution, Flower Pollination Algo-rithm. Furthermore, bio-inspired algorithms are not swarm intelligence based.

• Swarm intelligence based. Inspiration comes from swarm intelligence, i.e. thecollective behaviour of decentralised agents following a small set of simple rules.Examples are Particle Swarm Optimisation, Firefly Algorithm, Bat Algorithm,Cuckoo Search.

• Other methods.

4.2 Physics-based algorithms

4.2.1 Gravitational search algorithm

The gravitational search algorithm [19] mimics Newton’s law of gravitation whichstates that every mass attracts other individual masses by a force fij proportionalto the products mimj of the two individual masses and inversely proportional tothe square of the distance ‖xij‖ between them. The force is directed along the linexij/‖xij‖ intersecting both masses.

If the gravitational potential is

Vij = −Gmimj

‖xij‖(4.1)

then the force acting between the two masses is given by the negative gradient ofthe potential Vij , namely fij = −∇xiVij . G stands for the gravitational constant. Avector form of Newton’s law of gravitation is now given by

fij = Gmimj

‖xij‖2xij‖xij‖

. (4.2)

Considering a system which consists of N individual masses mi it is possible to utiliseNewton’s equation of motion in order to track the evolution in time of all individualmasses

miai =

N∑j=1 6=i

fij . (4.3)

The evolution in time depends on the potential solely, provided that the initial po-sitions and velocities are known. Furthermore, Newton’s equation of motion (4.3) inthe following form

d2xidt2

= ai =1

mi

N∑j=16=i

fij (4.4)

can now be discretised and solved by means of the Stormer-Verlet method, for in-stance. A simpler approach, known as the semi-implicit Euler method, is used instead

4.2. Physics-based algorithms 39

as accuracy of time evolution is not an issue here. Thus, equation (4.4) is equivalentto a pair of differential equations

dxidt

=vi, (4.5a)

dvidt

=ai. (4.5b)

The discrete version of the above system is obtained from the Taylor linear expansionof velocity

vi(t+ ∆t) ≈ vi(t) + ai(t)∆t (4.6)

and positionxi(t+ ∆t) ≈ xi(t) + vi(t+ ∆t)∆t. (4.7)

What is more, the linear Taylor expansions means that this method is first orderaccurate in contrast with the Stormer-Verlet method being second order accurate.The discrete form (4.6) and (4.7) of the system (4.5) indicates that the initial positionsand velocities should be known.

In the gravitational search algorithm masses are associated with agents (points) insuch a way that the objective function values are proportional to individual masses.Heavier masses attract lighter masses by a gravitational force analogous to (4.2).According to equation (4.4), Movement of individual agents is proportional to theirmasses, hence, the heavier the mass the slower its movement. This provides a mech-anism for exploitation, whereas exploration exists due to lighter masses and fastermovements. Positions of agents are associated with solutions if terms of argumentsof an objective function.

In order to account for the mass conservation∑Ni=1Mi = 1, the two auxiliary

points are calculated every iteration, namely current best agent b

b := arg minxni

f(xni ) (4.8)

and current worst agent ww := arg max

xnif(xni ). (4.9)

The actual mass per iteration is then calculated as

mni :=

f(xni )− f(w)

f(b)− f(w). (4.10)

However, the above equation does not account for the mass conservation. This isbecause as the algorithm progresses, both b and w becomes smaller. Hence, theindividual masses are normalised in the following way

Mni :=

mni

N∑i=1

mni

. (4.11)

Thus, the mass of a system is conserved∑Ni=1Mi = 1 and simply redistributed among

individual agents according to objective function f values.


The force coming from agent j acting on agent i is similar to that given by equation(4.2), namely

fnij := GnMni M

nj

U(0, 1) (xnj − xni

)‖xni − xnj ‖1 + ε

. (4.12)

There are, however, three differences. Firstly and most importantly, it is no longeran inverse square law as the force in not proportional to the square of a distance‖xij‖2. It is simply a distance ‖xij‖1 instead. In order to avoid division by zero, asmall constant ε is always added to the denominator. As the algorithm progressesthe gravitational constant is reduced according to

Gn := G0 e−α n

nmax . (4.13)

An algorithm constant α is introduced in order to control reduction of the gravitationconstant. Another algorithm constant is the initial value of G0. Secondly, the distance(xnj −xni ) between the two individual agents is not normalised. Lastly, randomisationis introduced to the force by means of the realisation of a stochastic vector variablewith uniform continuous distribution U(0, 1).

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.1: 400 evaluations of gravitational search algorithm

The discrete Newton’s equation of motion (4.4) makes it now possible to calculatethe acceleration ani of individual agents

ani :=1

Mni

N∑j=16=i

fnij . (4.14)

Assuming unit time step ∆t := 1, the updated velocity, according to equation (4.6),is now

vn+1i := U(0, 1) vni + ani (4.15)

where additional stochastic vector variable with uniform continuous distribution U(0, 1)is added in order to introduce randomisation. Finally, the next position of individual

4.3. Bio-inspired algorithms 41

agents is updated directly according to equation (4.7), resulting in

xn+1 := xn + vn+1. (4.16)

The algorithm is shown in listing 4.1. Figure 4.1 displays an example realisationof the gravitational search algorithm for nmax = 20 and N = 20 agents which isequivalent to 400 evaluations of the objective function. The solid polylines representtrajectories of individual agents.

Input: α, G0, N , nmax, L, UOutput: g

1 for i := 0 to N − 1 do2 xi := L + (U− L) U(0, 1);3 vi := 0;

4 for n := 0 to nmax − 1 do5 b := arg minxi f(xi);6 w := arg maxxi f(xi);7 if n = 0 then g := b;

8 M := f(x)−f(w)f(b)−f(w) ;

9 M := M∑Ni=1 Mi

;

10 G := G0 e−α n

nmax ;11 E := 0;12 for i := 0 to N − 1 do13 for j := 0 to N − 1 do14 if i 6= j then

15 Ei := Ei +Mj U(0,1)(xj−xi)‖xj−xi‖+ε ;

16 v := U(0, 1) v +GE;17 x := x + v;

Algorithm 4.1: Gravitational search algorithm pesudocode

4.3 Bio-inspired algorithms

4.3.1 Genetic algorithms

4.3.1.1 Evolutionary algorithms

Evolutionary Algorithms (EA) are multi-point (population based) optimisation al-gorithms. EA are classified as bio-inspired in the sense that they mimic Darwinianevolution. They evolve better solutions by means of recombination, mutation andsurvival. Also, EA operate on populations as other multi-point algorithms. One ofthe huge advantage of EA over traditional optimisation method is that they typicallydoes not need any additional information about the objective function. Another de-sirable property EA is parallelism. This is because all individuals of a population


(generation) perform independently. Further, randomisation of the EA is introducedthrough the probability of crossover and mutation. One can distinguish at least thetwo groups of algorithms:• Genetic Algorithms (GA) [10, 12]. Binary representation of individuals is used.

This means that they are encoded as vectors of bits and all genetic operatorssuch as crossover are performed on vectors. The disadvantage of this approachis a discretisation error due to the limited length of vectors. This is one of thereasons why the genetic algorithms with floating-point representation are bettersuited for continuous optimisation.

• Evolution Strategies (ES) [17]. Floating-point representation is used in order torepresent individuals. All genetic operators perform directly on floating-pointnumbers, meaning that no discretisation error is introduced.

Traditionally, both representations, i.e. binary and floating-point, are commonlytermed genetic algorithms.

4.3.1.2 Binary representation

Initialisation

Population

Parents

Offspring

Converged?

Stop

Parent selection

Recombination

Survivor selection

Yes

No

Figure 4.2: Genetic algorithmsflowchart

0 0 1 1 0 1 1

+

0 1 0 1 0 0 0

=

0 0 1 1 0 0 0

0 1 0 1 0 1 1

1 0 1 1 0 0 1

1 1 1 1 0 0 1

Figure 4.3: Crossover and mutation

The flowchart of the genetic algorithm is shown in figure 4.2. The first step called‘Initialisation’ includes encoding all individuals. In this case, binary representation ischosen. Also, random initial population is created and the fitness function is evalu-ated. Thus, the ‘Population’ step is achieved. Next, parents for further generationsare selected in order to produced offspring. This can be achieved by various method.Two most popular and common methods are roulette wheel and tournament selec-tion and the process is called ‘Parent selection’. The next step is recombination whereoffspring is produced. Typically, two parent produce two offspring by means of the ge-netic operators such as crossover (figure 4.3 – top) with high crossover probability pc.A random point is selected and exchange of bits from the left of that point with firstparent with bits on the right with second parent follows. As a results, two offspring


inherit portion of each parent. The next genetic operator is random mutation withlow probability pm. This results in altering certain number of bits as shown in figure4.3 (bottom). Mutation alters a 1 to 0 or conversely a 0 to 1. The next generationor populations is then created through the process called ’Survivor selection’. Twostrategies are possible, discussed further. Finally, the new population is evaluatedby means of the objective function and a stop criterion is checked in the last step‘Converged?’. Detailed description of steps like selection, recombination methods orsurvivor selections are given in the next paragraph.

4.3.1.3 Floating-point representation

Input: pc, pm, T , N , nmax, L, UOutput: g

1 for i := 0 to N − 1 do2 xi := L + (U− L) U(0, 1);3 yi := 0;

4 g := arg minxi f(xi);5 for n := 1 to nmax − 1 do6 for i := 1 to N − 1 do7 a := Tournament (x, T );8 b := Tournament (x, T );9 p1 := xa;

10 p2 := xb;

11 (c1, c2) := Crossover (p1,p2, pc);12 yi := Mutation (c1, i, pm);13 yi+1 := Mutation (c2, i, pm);14 i := i+ 2;

15 l := arg minxi f(xi);16 g := arg min f(g), f(l);17 x = Selection(x,y);

Algorithm 4.2: Genetic algorithm pesudocode

The genetic algorithm in the pseudocode form, regardless of how the individualsare represented, is shown in listing 4.2. However, details of internal functions aregiven for floating-point representation, since they are better suited for continuousoptimisation. Lines 7 and 8 represent parent selection steps by means of tournamentselection shown in listing 4.3. The tournament size T is necessary in order to selectT individual out of a parent population of N members. When T individuals areselected, the best of them is chosen to be a parent. Typically, T is low for smallpopulations, i.e. 2 or 3 yet the lowest value is 2. Obviously, the tournament selectionis of random character and the whole process reassembles competition for selectionin order to pass genetic material to offspring. Lines 1 and 3 in listing 4.3 representrandom variate of a discrete uniform distribution in order to select random member


of a parent populations. Once parents are selected, the crossover takes place (line 11in listing 4.2).

Input: T , N , xOutput: k

1 k := U0, N − 1;2 for i := 1 to T − 1 do3 j := U0, N − 1;4 if f(xj) < f(xk) then k = j;

Algorithm 4.3: GA parent selection (tournament) pesudocode

Crossover provides mixing of the solutions. Several methods are in use. The mostpopular — an arithmetical crossover — is discussed here, being simple and elegant.Two parents x1, x2 are crossed with probability pc. If U(0, 1) < pc then a randomnumber drawn from a uniform continuous distribution is generated

a := U(0, 1). (4.17)

Further, two parent vectors x1 and x2 produce two offspring vectors y1 and y2 ac-cording to

y1 := ax1 + (1− a)x2, (4.18a)

y2 := ax2 + (1− a)x1. (4.18b)

This also means that two offspring vectors are a linear combinations of two parentvectors. This method, however, guarantees that y1, y2 remain within an optimisationdomain Ω if its either unconstrained optimisation problem or the domain Ω is con-strained and convex (e.g. box constraints). The arithmetical crossover pseudocode isshown in listing 4.4.

Input: x1, x2, pcOutput: y1, y2

1 y1 := x1;2 y2 := x2;3 if U(0, 1) < pc then4 a := U(0, 1);5 y1 := ax1 + (1− a)x2;6 y2 := ax2 + (1− a)x1;

Algorithm 4.4: GA arithmetical crossover pesudocode

As soon as parents produce offspring, they children are mutated (lines 12, 13 inlisting 4.2) with probability pm. Mutation increases the diversity of the populationand provides a mechanism for escaping from local optima. Two types of mutationsare in common use:• Uniform• Nonuniform


A child is uniformly mutated if U(0, 1) < pm. Then a random individual is generatedwithin the search space according to the following equation

xi = L + (U− L) U(0, 1). (4.19)

The uniform mutation pseudocode is shown in listing 4.5.

Input: xi, pm, L, UOutput: yi

1 yi := xi;2 if U(0, 1) < pm then3 xi = L + (U− L) U(0, 1)

Algorithm 4.5: GA uniform mutation pseudocode

Nonuniform mutation takes place if U(0, 1) < pm. If so, then additional number∆ ∈ [0; 1] is generated

∆ := 1− U(0, 1)(1− nnmax

)2

. (4.20)

As the algorithm progresses the value of ∆ decreases. This leads to mutation dampingwhen the algorithm approaches to an end. Finally, components of a mutated childare given by

xik :=

xik + (Uk − xik) ∆, if U0, 1 = 0;

xik − (xik − Lk) ∆, otherwise(4.21)

where xi = (xi1, . . . , xiD). The nonuniform mutation pseudocode is shown in listing4.6.

Input: xi, pm, D, n, nmax, L, UOutput: yi

1 yi := xi;2 for k := 0 to D − 1 do3 if U(0, 1) < pm then

4 ∆ := 1− U(0, 1)(1− nnmax

)2

;5 if U0, 1 = 0 then6 yik := yik + (Uk − yik) ∆;7 else8 yik := yik − (yik − Lk) ∆;

Algorithm 4.6: GA nonuniform mutation pseudocode

The last main step in the genetic algorithm (listing 4.2) is ‘Selection’ (line 17), infact being survivor selection. The purpose of this is to pass best solutions onto nextgenerations. Traditionally, µ denotes total number of parent vectors xi and λ standfor the number of offspring vectors yi. In this case, both are equal µ = λ = N . Ingeneral, however, at least two selection strategies may be distinguish, keeping in mindthat λ ≥ µ:


• (µ, λ) strategy• (µ+ λ)

The (µ, λ) strategy selects the best µ out of λ offspring vectors y to become thenext parent vectors generation x. Listing 4.7 shows the straightforward (µ, λ) strategypseudocode (µ = λ).

Input: x, yOutput: x

1 x := y;Algorithm 4.7: GA (µ, λ) strategy pseudocode

The (µ + λ) strategy creates the next parent vector generation with the best µvectors from the combined parent x and offspring y population of µ+λ vectors. The(µ+ λ) strategy pseudocode is shown in listing 4.8.

Input: x, y, NOutput: x

1 x = x ∪ y;2 Sort(xi) based on f(xi);3 x := x \ xN , . . . ,x2N−1;

Algorithm 4.8: GA (µ+ λ) strategy pseudocode

Figure 4.4 displays an example realisation of the genetic algorithm for nmax =20 and N = 20 individuals which is equivalent to 400 evaluations of the objectivefunction. As usual, the solid polylines represent trajectories of individuals.

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.4: 400 evaluations of genetic algorithm


4.3.2 Differential evolution

Differential evolution [20] is a simple, fast and effective metaheuristic algorithm. Likeother metaheuristic algorithms, DE does not need any additional information aboutthe objective function. DE is similar to genetic algorithms. What is even more, DE isregarded as the next step in evolution of genetic algorithms. Crossover and mutationare utilised on floating-point vectors, which makes it similar to GA. Additionally,selection is also present in DE. Most importantly, an explicit update equation isprovided in contrast with GA.

Input: C, F , N , nmax, L, UOutput: g

1 for i := 0 to N − 1 do2 xi := L + (U− L) U(0, 1);3 yi := 0;

4 g := arg minxi f(xi);5 for n := 1 to nmax − 1 do6 for i := 0 to N − 1 do7 K := H (C − U(0, 1));8 KU0,D−1 := 1;9 a := RandomPermutation(0, . . . , N − 1 \ i);

10 yi := K (xa3+ F (xa1

− xa2)) + (1−K) xi;

11 for i := 0 to N − 1 do12 xi := arg min f(xi), f(yi);13 g := arg min f(g), f(yi);

Algorithm 4.9: Differential evolution pseudocode

Differential evolution consists of four main steps, namely, three different individ-uals selection, mutation, crossover and selection. The first step, i.e., three differ-ent and randomly chosen individuals xna , xnb , xnc out of a population x means thatxna ,xnb ,xnc ⊆ x. Additionally, the population size is N ≥ 4. Once three individualsare selected, a mutant vector vi is generated according to

vi := xna + F (xnb − xnc ). (4.22)

The scale factor F , or the so called differential weight F ∈]0; 1[, is used in order tocontrol the rate of population development. Furthermore, the trial vector yi is createdvia binomial crossover with probability C

yij :=

vnij , if U(0, 1) < C;

xnij , otherwise.(4.23)

Other crossover techniques, such as exponential crossover, are possible. The crossoverprobability C ∈ [0; 1] regulates how many of the mutant vector is copied to the trialvector. Alternatively, one can combine mutation and binomial crossover in a single


vector equation by means of the Heaviside step (theta) function H. Introducingauxiliary D-dimensional vector K consisting of 0 and 1

K := H (C − U(0, 1)) , (4.24)

it is now possible to combine equations (4.22) and (4.23), i.e. mutation and crossover,into a single vector formula

yi := K (xna + F (xnb − xnc )) + (1−K) xni . (4.25)

The above equation is present in line 10 in DE pseudocode listing 4.9 together withequation (4.24) (line 7). Three different and randomly chosen individuals are indexedon the basis of a random permutation vector a (line 9). Additionally, line 8 corre-sponds to setting a random index of the vector K to 1 in order to guarantee that theyi 6= xni .

The last step is selection (line 12 in algorithm 4.9). Simply, the best solution ofthe trial vector yi and original individual xni , in terms of the objective function value,is passed onto a next generation xn+1

i

xn+1i :=

yi, if f(yi) < f(xni );

xni , otherwise.(4.26)

This step is fully deterministic in contrast with mutation and crossover. Furthermore,the four essential steps are applied to all member xni of the population until thenew population xn+1

i is created. The algorithm terminates if a given stop criterionis satisfied. This could be for instance a maximum number of objective functionevaluations.

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.5: 400 evaluations of differential evolution

Figure 4.5 presents an example realisation of differential evolution for nmax =20 and N = 20 individuals. This is equivalent to 400 evaluations of the objectivefunction. Trajectories of individuals are represented by means of the solid polylines.


Various differential evolution variants are in use, each of which has its own nota-tion. These include, among others:• DE/Rand/1/Bin

yi := xna + F (xnb − xnc ) . (4.27)

• DE/Best/1/Binyi := gn + F (xnb − xnc ) . (4.28)

• DE/Rand/m/Bin

yi := xna1 +

m∑j=1

Fj(xna2j − xna2j+1

). (4.29)

• DE/Best/m/Bin

yi := gn +

m∑j=1

Fj(xna2j − xna2j+1

). (4.30)

Word ‘rand’ stands for the first, randomly chosen, individual xa whereas ‘best’ repre-sents current global best gn. Third number in the above notation (1 or m) shows thenumber of individual differences added to the first individual. Finally, ‘Bin’ describesa binomial crossover.

4.3.3 Flower pollination algorithm

The flower pollination algorithm [29], as the name suggest, is inspired by the pollina-tion of flowers phenomena. Two types of pollination are considered, i.e. global andlocal. Global pollination, taking place over long distances, is mimicked by a Levyflight and termed as global search. Local pollination (short distances) is mimicked bya local search. The interaction between local and global pollination is controlled by aprobability p. In other words, FPO algorithm is simply a combination of global andlocal random walk. Thus, the update formula is

y :=

xni + α ε, if p < U(0, 1);

xni + U(0, 1)(xnj − xnk ), otherwise(4.31)

where the random vector ε is drawn from the symmetrical Levy stable distribution(3.18)

ε :=σN (0, 1)

|N (0, 1)| 1λ (g − xni ). (4.32)

Two different individuals xnj and xnk are taken randomly from the current populationand their difference is scaled by random variate of the continuous uniform distribution.Consequently, this reassembles local random search (walk). Equation (4.31) is presentin lines 7-12 in algorithm 4.10. If the new individual y is better than it predecessorxni then it is passed onto a new population

xn+1i :=

y, if f(y) < f(xn+1

i );



Input: α, λ, N , nmax, p, L, UOutput: g

1 for i := 0 to N − 1 do2 xi := L + (U− L) U(0, 1);

3 g := arg minxi f(xi);

4 σ :=

(Γ(1+λ) sin πλ

2

Γ( 1+λ2 )λ 2

λ−12

) 1λ

;

5 for n := 1 to nmax − 1 do6 for i := 0 to N − 1 do7 if p < U(0, 1) then

8 ε := σN (0,1)

|N (0,1)| 1λ (g − xi);

9 y := xi + αε;

10 else11 R := RandomPermutation(0, . . . , N − 1);12 y := xi + U(0, 1) (xR1 − xR2);

13 CheckRange(y);14 xi := arg min f(xi), f(y);15 g := arg min f(xi), f(g);

Algorithm 4.10: Flower pollination algorithm pseudocode

The above condition is represented by line 14 in algorithm 4.10.Figure 4.6 shows an example of a flower pollination algorithm realisation for

nmax = 20 and N = 20 individuals (equivalent to 400 evaluations of the objectivefunction). Trajectories of individuals are represented by means of the solid polylines.

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.6: 400 evaluations of flower pollination algorithm

4.4. Swarm intelligence based algorithms 51

4.4 Swarm intelligence based algorithms

4.4.1 Particle swarm optimisation

Particle swarm optimisation [13] is a bio-inspired, and at the same time swarm intel-ligence based, global optimisation algorithm. Consequently, the algorithm is inspiredby swarm or collective behaviour frequently observed among certain animals. Whatis important, there is no central coordination and swarm behaviour is regarded as acollective motion of agents following a small set of simple rules. Agents interact withone another at the local scale. The whole process leads to an intelligent like globalbehaviour.

Input: α, β, θ0, N , nmax, δ, L, UOutput: g

1 for i := 0 to N − 1 do2 x∗i := xi := L + (U− L) U(0, 1);3 vi := 0;

4 g := arg minxi f(xi);5 for n := 1 to nmax − 1 do6 θ := θ δ1/nmax ;7 for i := 0 to N − 1 do8 ε1, ε2 := U(0, 1);

9 vi := (θ0 + θ) vi + αε1 (x∗i − xi) + βε2 (g − xi);10 xi := xi + vi;

11 x∗i := arg min f(x∗i ), f(xi);12 g := arg min f(g), f(x∗i );

Algorithm 4.11: Particle swarm optimisation pseudocode

Assuming unit time step ∆t := 1, the position update formula of individual par-ticle is

xn+1i := xni + vn+1

i (4.34)

and velocity

vn+1i := vni + α ε1 (x∗i − xni ) + β ε2 (g − xni ) . (4.35)

The two random vectors ε1, ε2 are drawn from the continuous uniform distributiondistribution

ε1 := U(0, 1), (4.36a)

ε2 := U(0, 1) (4.36b)

thus introducing randomisation to the update formula. Clearly, the two main com-ponents can be distinguished in equation (4.35), i.e. attraction towards the particle’sbest position x∗i so far and attraction towards the global best position g. Addition-ally, the ratio between these two attraction is balanced by means of two components α


and β. In order to reduce velocity as the algorithm progresses, the so called dampingfunction θ, having the following property, analogous to the cooling formula (3.39)

θn+1 ≤ θn. (4.37)

As a result, the stabilised version of the velocity update formula (4.35) is now

vn+1i := (θ0 + θn) vni + α ε1 (x∗i − xni ) + β ε2 (g − xni ) . (4.38)

The above equation is present in line 9 in algorithm 4.11 together with the positionupdate formula (4.34) (line 10). The particle’s best position x∗i and global bestposition g are checked every position update (lines 11 and 12). Initial positions ofparticles x0

i are uniformly distributed (line 2) and initial velocities are assumed to bezero v0

i := 0 (line 3).

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.7: 400 evaluations of particle swarm optimisation

Figure 4.7 shows an example realisation of a particle swarm optimisation algorithmfor nmax = 20 and N = 20 individuals (400 evaluations of the objective function). Asusual, trajectories of individuals are represented by means of the solid polylines.

4.4.2 Accelerated particle swarm optimisation

The accelerated particle swarm optimisation [30] does not take advantage of the parti-cle’s best position x∗i . Additional randomisation and diversity is replaced by a randomvectors ε1. A simplified velocity update formula is now

vn+1i := (θ0 + θn) vni + α ε1 + β (g − xni ) . (4.39)

The random vectors ε1 is drawn from the continuous uniform distribution and isscaled by means of the search space range (U− L)

ε1 := (U− L) U(− 1

2 ,12

). (4.40)


Moreover, the position update formula remains intact (4.34). The same concernsthe damping function θ (4.37). Another obvious difference between equation (4.38)and (4.39) is lack of the second random vector ε2. Consequently, the two maincomponents can now be distinguished in equation (4.39), i.e. randomisation via α ε1

and deterministic attraction towards the global best position β(g − xni ).

Input: α, β, θ0, N , nmax, δ, L, UOutput: g


4 g := arg minxi f(xi);5 for n := 1 to nmax − 1 do6 θ := θ δ1/nmax ;7 for i := 0 to N − 1 do8 ε1 := (U− L) U(− 1

2 ,12 );

9 vi := (θ0 + θ) vi + αε1 + β (g − xi);10 xi := xi + vi;

11 g := arg min f(g), f(xi);Algorithm 4.12: Accelerated particle swarm optimisation 1 pseudocode

The new velocity update equation (4.39) is present in line 9 in algorithm 4.12together with the random vector (4.40) (line 8). The position update formula (4.34)(line 10) is the same. Only global best position g is updated every position (line 11).Initial positions or particles x0

i are uniformly distributed (line 2) and initial velocitiesare assumed to be zero v0

i := 0 (line 3).

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.8: 400 evaluations of acceler-ated particle swarm optimisation 1

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.9: 400 evaluations of acceler-ated particle swarm optimisation 2


In order to avoid initialisation of velocities it is possible to simplify the acceleratedparticle swarm optimisation even further. Substituting equation (4.39) into (4.34) andremoving velocity we have

xn+1i := xni + (α0 + αn) ε1 + β (g − xni ) . (4.41)

Furthermore, the coefficient α is replaced by another damping function α, having thefollowing property

αn+1 ≤ αn. (4.42)

The random vectors ε1 remains the same (equation (4.40)). The second version of theaccelerated particle swarm optimisation in the pseudocode form is shown in listing4.13. It is shorter in comparison with listing 4.12 as there is no need to initialise andupdate velocities. The position update equation (4.41) is present in line 8.

Input: α0, β, N , nmax, δ, L, UOutput: g


3 g := arg minxi f(xi);4 for n := 1 to nmax − 1 do5 α := α δ1/nmax ;6 for i := 0 to N − 1 do7 ε1 := (U− L) U(− 1

2 ,12 );

8 xi := xi + (α0 + α) ε1 + β (g − xi);9 g := arg min f(g), f(xi);

Algorithm 4.13: Accelerated particle swarm optimisation 2 pseudocode

Figures 4.8 and 4.9 displays an example realisation of two version of the particleswarm optimisation algorithm for nmax = 20 and N = 20 individuals (equivalent400 evaluations of the objective function). Trajectories of individuals are representedby means of the solid polylines. The acceleration is obvious when compared to thestandard particle swarm optimisation in figure 4.7.

4.4.3 Firefly algorithm

The firefly algorithm [31] can be regarded as a variant of the particle swarm opti-misation. It is inspired by the flashing light of fireflies. Fireflies are attracted toone another and the attractiveness is proportional to the light intensity (objectivefunction value). Additionally, the light intensity decreases as distance between twofireflies increases.

The structure of the firefly algorithm is shown in listing 4.14. The movement(update formula) of a firefly xi towards more attractive firefly xj is given by equationsimilar to (4.41) (line 11 in listing 4.14)

xi := xi + α ε+(β0 + βe−γ‖xj−xi‖

2)

(xj − xi) . (4.43)


In the above formula ‖xj − xi‖ represents the distance between two fireflies xi andxj and the whole second term is known to be the attraction term. Attractiveness ofa firefly xi is directly related to its brightness Ii

I(r) = I0e−γ‖r‖2 (4.44)

which is directly proportional to the objective function value f(xi). The term β0 +

βe−γ‖xj−xi‖2

combines light absorption, where γ is a light absorption coefficient, andlight intensity variation according to the inverse square law. The attractiveness at‖xj − xi‖ = 0 is indicated here as β0 + β.

Input: α, β, β0, γ, N , nmax, δ, L, UOutput: g


3 for n := 0 to nmax − 1 do4 Sort(xi) based on f(xi);5 g := x0;6 y := x;

7 α := α δ1/nmax ;8 for i := 0 to N − 1 do9 for j := 0 to j < i− 1 do

10 ε := (U− L) U(− 1

2 ,12

);

11 xi := xi + α ε+(β0 + βe−γ‖xj−xi‖

2)

(yj − xi);

Algorithm 4.14: Firefly algorithm pseudocode

The third term in equation (4.43) is randomization. Obviously α stands for arandomization parameter and controls the randomness of the movement. The ran-domization parameter α is gradually reduced

αn+1 ≤ αn (4.45)

by means of α := α δ1/nmax (line 7 in listing 4.14) for a typical values of δ = 10−4.This coefficient controls the step size in order to gradually reduce motion of thefireflies. By U

(− 1

2 ,12

)one understands a value sampled from the continuous uniform

distribution parametrized by − 12 and 1

2 . The randomization should be understoodas a separate randomization of each kth components of the spatial coordinate, i.e.,αU

(− 1

2 ,12

)|Uk − Lk|. Lower and upper box constraints are denoted as Uk and Lk

respectively. In vector notation we have (line 10 in listing 4.14)

ε := (U− L) U(− 1

2 ,12

). (4.46)

There are two main differences between PSO and FA algorithms. Firstly, the updateformula (4.43)) includes light absorption term proportional to the inverse square law.Secondly, double loop is present in the algorithm 4.14 (line 9).


0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.10: 400 evaluations of firefly algorithm

Figures 4.10 display an example realisation of the firefly algorithm for nmax =20 and N = 20 individuals (equivalent 400 evaluations of the objective function).Trajectories of individuals are represented by means of the solid polylines.

Several variants of the firefly algorithm exist. One of them, namely the rotationalfirefly algorithm [24] is briefly discussed here. The difference between FA and therotational firefly rest on generation of C clusters Gi of fireflies. The division intoclusters is a matter of convention. An example of the division into disjoint clusters(subsets) Gi is given by

∀i 6=jGi ∩ Gj 6= ∅ ∧

C⋃i=1

Gi = x1, . . . ,xN . (4.47)

The number of clusters C may be chosen from[1; bN2 c

]. Each cluster is then rotated

by an angle ϑ around a pole xr. The pole (centre of rotation) is defined as

xr = xa + κ (xb − xa) (4.48)

where the average firefly xa is calculated as

xa =1

|Gi|

|Gi|∑i=1

xi. (4.49)

The coefficient κ is taken from κ ∈ [0; 1]. The two extreme cases are possible. Forκ = 0 the cluster rotates around an average firefly (4.49) or for κ = 1 the centre ofrotation is the best firefly xb = arg minxj∈Gi f(xj) of cluster Gi. An intermediate caseκ = 1

2 can be assumed. The division into adjacent clusters can be kept as simple as

possible. The cluster size is bNC c and the remaining cluster, if any, is rotated if its

size N mod C > bNC c.Rotations are intuitive and well defined in two- and three-dimensional spaces but

they are not limited only to those cases. This is because the spherical coordinates


are defined in D-dimensional spaces and are analogous to the spherical coordinatesystem defined for the three-dimensional space and polar for two-dimensional spaces.Actual rotation in the hyperspherical coordinates (r, φ1, . . . , φD−1) of any point x =(x1, . . . , xD) is calculated by means of the following transformation

x1 = r cosφ1, (4.50a)

xk = r cosφk

k−1∏i=1

sinφi; if 1 < k < D, (4.50b)

xD = r

D−1∏i=1

sinφi (4.50c)

where the last angle φD−1 is increased by ϑ, i.e. φD−1 := φD−1+ϑ. Cluster generationand rotation is performed before the two main loops in algorithm 4.14 take place.

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.11: 400 evaluations of the bat algorithm

4.4.4 Bat algorithm

The bat algorithm [28] is a bio-inspired and simultaneously swarm intelligence based,global optimisation algorithm. Inspirations comes from the echolocation of bats.Properties and simplifications of the simplest bat algorithm are:• Bats perform random flight with velocity vi based on pulse rate r.• Random local search is performed based on pulse rate r around the current

global best g.• In the main, frequency of the emitted pulse F is adjustable in order to corre-

spond to the search domain at least. Random value of F is assumed F ∈ [Fl;Fu]notwithstanding.

• In general, rate of the emitted pulse r is adjustable by bats – the closer to aprey the faster r. However, constant values of r is assumed.


• In spite of the fact that the loudness A vary (the closer to a prey the quieter),it is assumed to be constant.

Input: α, A, r, Fl, Fu, N , nmax, L, UOutput: g


4 g := arg minxi f(xi);5 for n := 1 to nmax − 1 do6 for i := 0 to N − 1 do7 if r < U(0, 1) then8 ε := (U− L) N (0, 1);9 y := g + α ε;

10 else11 F := Fl + (Fu − Fl)U(0, 1);12 vi := vi + F (g − xi);13 y := xi + vi;

14 if f(y) < f(xi) and U(0, 1) < A then15 xi := y;

16 g := arg min f(g), f(y);Algorithm 4.15: Bat algorithm pseudocode

In other words, the bat algorithm is a combination of two random walks. Switch-ing between individual walks is controlled by the probability r. Consequently, theintermediate update equation is

y :=

g + α ε, if r < U(0, 1);

xni + vn+1i , otherwise.

(4.51)

The random vector ε, present in the local search, is drawn from the standard normaldistribution scaled by the search domain

ε := (U− L) N (0, 1). (4.52)

Velocity of the random flight vi is adjustable by means of the frequency F

vn+1i := vni + F (g − xni ). (4.53)

Random variations of the frequency F is limited by the lower Fl and upper Fu values.This leads to

F := Fl + (Fu − Fl)U(0, 1). (4.54)

If the intermediate individual y is better than its predecessor xni then it is passedonto a new population with a probability (loudness) A

xn+1 :=

y, if f(y) < f(xi) and U(0, 1) < A;



The bat algorithm in the pseudocode form is shown in listing 4.15. Moreover, it isregarded as another variant of the particle swarm optimisation. Figures 4.11 displayan example realisation of the bat algorithm for nmax = 20 and N = 20 individuals(equivalent 400 evaluations of the objective function). Solid polylines correspond totrajectories of individuals.

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.12: 400 evaluations of cuckoosearch (20 individuals)

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 4.13: 4000 evaluations ofcuckoo search (200 individuals)

4.4.5 Cuckoo search

Cuckoo search [27] is inspired by the brood parasitism of the cuckoos. Cuckoo eggsrepresent individuals and are evaluated by the objective function. Eggs are droppedby cuckoos randomly and can be discovered by the host with probability p and aban-doned.

In practice, however, cuckoo search is simply a sequence of global and local randomsearch. The former is performed by means of Levy flight whereas the latter by randomwalk. What is more, the subsequent local random search is controlled by a probabilityp. The structure of cuckoo search is shown in listing 4.16. The intermediate updateequation (line 10) for the global random search is

y := xni + α ε1 (4.56)

where the random vector ε1 (line 9) is drawn from the symmetrical Levy stable likedistribution (3.20)

ε1 :=σN (0, 1)

|N (0, 1)| 1λ(U− L) ε

‖ε‖ (4.57)

and ε is taken from the standard normal distribution (line 8)

ε := N (0, 1). (4.58)


Better solution between the intermediate y and original individual xni , in terms ofthe objective function value, is passed onto a next generation xn+1

i (line 11)

xn+1i :=

y, if f(y) < f(xni );


Input: α, λ, N , nmax, p, L, UOutput: g



4 σ :=

(Γ(1+λ) sin πλ

2

Γ( 1+λ2 )λ 2

λ−12

) 1λ

;

5 n := N − 1;6 repeat7 for i := 0 to N − 1 do8 ε := N (0, 1);

9 ε := σN (0,1)

|N (0,1)| 1λ(U− L) ε

‖ε‖ ;

10 y := xi + αε;11 xi := arg min f(xi), f(y);12 n := n+ 1;

13 a,b := Random Permutation(0, . . . , N − 1);14 for i := 0 to N − 1 do15 ε := U(0, 1) (xai − xbi);16 y := xi + αεH (p− U(0, 1));17 if xi 6= y then18 xi := arg min f(xi), f(y);19 n := n+ 1;


21 until n < nmax;Algorithm 4.16: Cuckoo search pseudocode

Local random search update equation includes the abandonment probability p.Consequently, the intermediate individual y is (line 16)

y := xn+1i + α ε2 H (p− U(0, 1)) . (4.60)

Two different individuals xn+1j and xn+1

k are taken randomly from the current popu-lation by means of random permutations (line 13). Two random permutation sets a,b of N numbers are generated separately in order to provide consecutive indexes jand k. The random vector ε2 is drawn from the continuous uniform distribution andis scaled by means of the difference (xn+1

j − xn+1k )

ε2 := U(0, 1) (xn+1j − xn+1

k

). (4.61)


As a result, this reassembles local random search (line 15). Additionally, the Heavisidestep function H is utilised in order to account for the abandonment probability p. Forp := 1 equations (4.60) and (4.61) are similar to differential evolution (4.22). If thenew individual y is better than its predecessor xn+1

i then it is passed onto a newpopulation

xn+2i :=

y, if f(y) < f(xn+1

i );

xn+1i , otherwise.

(4.62)

Figures 4.12 display an example realisation of cuckoo search for N = 20 indi-viduals and 400 evaluations of the objective function. Solid polylines correspond totrajectories of individuals. Figures 4.13 presents an interesting property of cuckoosearch, namely, its ability to localise many local minima at the same time. This is,however, possible only if the population is large enough.

Chapter 5

Constraints

5.1 Unconstrained and constrained optimisation

As it has been previously mentioned, the general problem of unconstrained optimisa-tion, i.e., minimisation in this case, is expressed as

minx∈RD

f(x) = f0 (5.1)

where f : RD → R is the objective function to be minimised and x ∈ RD is anindependent variable. Moreover, the argument x0 of the minimum value f0 of theobjective function f is defined as

x0 = arg minx∈RD

f(x). (5.2)

In other words, the general problem of unconstrained optimisation is the processof optimising an objective function f in the absence of constraints on independentvariable. This simply means that the independent variable x ∈ RD.

The constrained optimisation problem, i.e., minimisation in this case, is given by

minx∈Ω

f(x) = f0 (5.3)

where Ω is a constraint set, or the so called optimisation domain. This time, however,the argument x0 of the minimum value f0 of the objective function f is defined as

x0 = arg minx∈Ω

f(x), (5.4)

meaning that the constrained optimisation problem is the process of optimising anobjective function f in the presence of constraints Ω on independent variable x orsimply

Ω ⊆ RD. (5.5)

We can distinguish:

5.2. Lagrange multipliers 63

• Equality constraints. The constraint set is expressed by means of equality con-straint functions gi

Ω :=x ∈ RD : gi(x) = 0

. (5.6)

The index i belongs to the index set of equality constraints i ∈ 1, . . . ,m.• Inequality constraints. The constraint set is expressed by means of inequality

constraint functions hj

Ω :=x ∈ RD : hj(x) ≤ 0

. (5.7)

The index j belongs to the index set of equality constraints j ∈ 1, . . . , k.• Box constraints. The constraint set is expressed by means of simplified inequal-

ity constraintsΩ :=

x ∈ RD : Li ≤ xi ≤ Ui

. (5.8)

This also means that box constraints are the special case of inequality con-straints where hj(xi) := Li−xi and so on. What is more, this type of constraintsis commonly met in optimisation practice. As usual, Li and Ui are lower andupper bounds, respectively. If Li := −∞ and Ui := ∞ for all i ∈ 1, . . . , Dthen the box constrained problem becomes unconstrained.

• Equality and inequality constraints. The constraint set is expressed by meansof equality gi and inequality hj constraint functions

Ω :=x ∈ RD : gi(x) = 0, hj(x) ≤ 0

. (5.9)

The above set expresses the most general constrained optimisation problem.

5.2 Lagrange multipliers

5.2.1 The method

The method of Lagrange multipliers is a method for converting constrained optimisa-tion problems to unconstrained problems. Let us assume, without loss of generality,a two-dimensional function f : R2 → R to be minimised with an equality constraintg of the form (5.6)

z = f(x, y), (5.10a)

g(x, y) = 0. (5.10b)

Assuming further that y can be explicitly expressed from g as a function of x andsubstituted into equation (5.10a), namely z = f(x, y(x)), we have the necessary con-dition for optimality dz

dx = 0 and at the same time dgdx = 0. Using the chain rule, the

two above conditions are

∂f

∂x+∂f

∂y

dy

dx= 0, (5.11a)

∂g

∂x+∂g

∂y

dy

dx= 0. (5.11b)

64 5. Constraints

By virtue of relations

dy

dx= −

∂f∂x∂f∂y

= −∂g∂x∂g∂y

, (5.12)

we have a constant multiplier, i.e., a Lagrange multiplier

λ := −∂f∂x∂g∂x

= −∂f∂y

∂g∂y

. (5.13)

The necessary condition for optimality (5.11) is therefore

∂f

∂x+ λ

∂g

∂x= 0, (5.14a)

∂f

∂y+ λ

∂g

∂y= 0. (5.14b)

Introducing an auxiliary function F , sometimes referred to as the so called Lagrangianfunction

F (x, y, λ) := f(x, y) + λ g(x, y) (5.15)

we have the necessary condition for optimality

∇F (x, y, λ) = 0. (5.16)

One has to keep in mind that the Lagrangian function F is now three-dimensional incomparison with the original two-dimensional function f , Equivalently, the necessarycondition for optimality (5.16) is

∂F

∂x=∂f

∂x+ λ

∂g

∂x= 0, (5.17a)

∂F

∂y=∂f

∂y+ λ

∂g

∂y= 0, (5.17b)

∂F

∂λ= g = 0. (5.17c)

It can be easily verified that the above three conditions correspond to equations (5.14)and the equality constraint (5.10b). Typically, the value of λ is of no interest. If itis possible, it should be first solved for λ in order to remove it from the system ofequations.

5.2.2 Equality constraints

If there are m equality constraints gi of the form (5.6), the method of Lagrangemultipliers can be easily extended. It is enough to introduce the following form of theLagrangian function

F (x, λ1, . . . , λm) := f(x) +

m∑i=1

λigi(x). (5.18)


Thus, again we convert constrained optimisation problem with m equality constraintsto unconstrained problem. Furthermore, if the following vectors are introduced

λ := λ1, . . . , λm , (5.19a)

g := g1, . . . , gm (5.19b)

then the Lagrangian function form (5.18) is similar to the case with one constraint(5.15)

F (x,λ) := f(x) + λ · g(x). (5.20)

Obviously, the Lagrange multiplier vector λ is of the same size as constraints vectorg, namely m.

−5 −2.5 02.5

5 −5−2.5

02.5

5

−20

20

60

xy

z

−4 −2 0 2 4

−4

−2

0

2

4

x

y

−4 −2 0 2 4

−4

−2

0

2

4

Figure 5.1: Equality constraints example

Let us consider the following optimisation example. We have a two-dimensionalobjective function f : R2 → R to be minimised

f(x, y) := 2x2 + 4y (5.21)

subject to the following equality constraint (see figure 5.1)

Ω :=

(x, y) ∈ R2 : g(x, y) := xy − 1 = 0. (5.22)

This also means that m := 1. According to equation (5.18) or (5.20), the three-dimensional Lagrangian function F : R3 → R is

F (x, y, λ) := 2x2 + 4y + λ (xy − 1) . (5.23)

As a consequence, the Lagrangian function is unconstrained. From the necessarycondition for optimality ∇F = 0 it arises that

4x+ λy = 0, (5.24a)

4 + λx = 0, (5.24b)

xy − 1 = 0. (5.24c)

66 5. Constraints

The above system of equations has to be solved now. Since we are not interested inthe value of λ, we first solve for λ in terms of x and y. For instance, taking intoconsideration equation (5.24b) we have λ = −4x−1. Substituting λ into equations(5.24a) and (5.24c) results in two equation system and two unknowns x and y. Thecritical point, i.e., solution of the necessary condition for optimality ∇F = 0, is (1, 1).Thus we have the minimum

x0 = (1, 1) = arg minx∈Ω

f(x) (5.25)

and f(1, 1) = 6. Finally, the value of the objective function at x0 = (1, 1) yields

minx∈Ω

f(x) = f0 = 6. (5.26)

5.2.3 Inequality constraints

The method of Lagrange multipliers can be further extended to problems with kinequality constraints hj of the form (5.7). Firstly, the so called penalty function pjhas to be introduced

pj(x, βj) := hj(x) + β2j (5.27)

where βj ∈ R is a penalty variable. Secondly, the following form of the Lagrangianfunction is assumed

F (x, λ∗1, . . . , λ∗k, β1, . . . , βk) := f(x) +

k∑j=1

λ∗j(hj(x) + β2

j

). (5.28)

As previously, we convert constrained optimisation problem, this time, however, withk inequality constraints, to unconstrained problem. Given that, the following vectorsare introduced

λ∗ := λ∗1, . . . , λ∗k , (5.29a)

h := h1, . . . , hk , (5.29b)

β := β1, . . . , βk , (5.29c)

the Lagrangian function form (5.29) is now

F (x,λ∗,β) := f(x) + λ∗ · (h(x) + β β) . (5.30)

The Lagrange multiplier vector λ∗ is of the same size as constraints vector h and thepenalty variables vector β, namely k.

Let us consider the next optimisation example. A two-dimensional objective func-tion f : R2 → R to be minimised

f(x, y) := x2 + y2 (5.31)

subject to the following inequality constraint (see figure 5.2)

Ω :=

(x, y) ∈ R2 : h(x, y) := 1− x ≤ 0. (5.32)


The penalty function is p(x, β) := h(x) + β2, where β ∈ R. This means thatk := 1. According to equation (5.28) or (5.30), the unconstrained four-dimensionalLagrangian function F : R4 → R is

F (x, y, λ∗, β) := x2 + y2 + λ∗(1− x+ β2

). (5.33)

From the necessary condition for optimality ∇F = 0 we have the following system offour equations with four unknowns

2x− λ∗ = 0, (5.34a)

y = 0, (5.34b)

1− x+ β2 = 0, (5.34c)

λ∗β = 0. (5.34d)

In order to solve the system (5.34), two cases have to considered (equation (5.34d)).The first case is λ∗ = 0. By equations (5.34b) and (5.34c), this immediately impliesx = y = 0 and most importantly β2 = −1, which is impossible by the assumptionβ ∈ R. Because of this inconsistency, (0, 0) cannot be a critical point. Also, (0, 0) /∈ Ω,see figure 5.2. The second case is β = 0. By equations (5.34b) and (5.34c), thisimplies x = 1 and y = 0. The critical point, i.e., solution of the necessary conditionfor optimality ∇F = 0, is (1, 0). Thus we have the minimum

x0 = (1, 0) = arg minx∈Ω

f(x), (5.35)


minx∈Ω

f(x) = f0 = 1. (5.36)

−2 −10

12 −2

−1

01

20

4

8

xy

z

−4 −2 0 2 4

−4

−2

0

2

4

x

y

−4 −2 0 2 4

−4

−2

0

2

4

Figure 5.2: Inequality constraints example

68 5. Constraints

5.2.4 Equality and inequality constraints

We simply combine the two previous cases, namely equation (5.18) and (5.28). Thusthe Lagrangian function is

F (x, λ1, . . . , λm, λ∗1, . . . , λ

∗k, β1, . . . , βk) :=

f(x) +

m∑i=1

λigi(x) +

k∑j=1

λ∗j(hj(x) + β2

j

). (5.37)

We have m equality constraints gi of the form (5.6) and k inequality constraints hj ofthe form (5.7) together with k penalty functions pj . Equivalently, in vector notationthe Lagrangian function (5.37) is now

F (x,λ,λ∗,β) := f(x) + λ · g(x) + λ∗ · (h(x) + β β) . (5.38)

Definitions (5.19) and (5.29) hold.

−2 −10

12 −2

−1

01

20

4

8

xy

z

−4 −2 0 2 4

−4

−2

0

2

4

x

y

−4 −2 0 2 4

−4

−2

0

2

4

Figure 5.3: Equality and inequality constraints example

Let us consider the next optimisation example. A two-dimensional objective func-tion f : R2 → R to be minimised

f(x, y) := x2 + y2 (5.39)

subject to the following equality and inequality constraints (see, figure 5.3)

Ω :=

(x, y) ∈ R2 : g(x, y) := y − x− 1;h(x, y) := 1− x ≤ 0. (5.40)

Since we deal with equality and also inequality constraints, the penalty functionp(x, β) := h(x) + β2 is necessary, where β ∈ R. In this case m := 1 and k := 1.According to equation (5.37) or (5.38), the unconstrained five-dimensional Lagrangianfunction F : R5 → R is

F (x, y, λ, λ∗, β) := x2 + y2 + λ (y − x− 1) + λ∗(1− x+ β2

). (5.41)


From the necessary condition for optimality ∇F = 0 we have the following system offive equations with five unknowns

2x− λ− λ∗ = 0, (5.42a)

2y + λ = 0, (5.42b)

y − x− 1 = 0, (5.42c)

1− x+ β2 = 0, (5.42d)

λ∗β = 0. (5.42e)

In order to solve the above system of equation, two cases have to considered, namelyλ∗ = 0 and β = 0. Following the same line of reasoning, the critical point, i.e.,solution of the necessary condition for optimality ∇F = 0, is (1, 2). Thus we havethe minimum

x0 = (1, 2) = arg minx∈Ω

f(x), (5.43)


minx∈Ω

f(x) = f0 = 5. (5.44)

5.2.5 Box constraints

The box constraints are often considered to be natural in numerical optimisationproblems. Lower Li and upper Ui bounds are usually necessary in order to generateinitial population. If the updated individual xn+1

i is out of range, i.e., xn+1i /∈ Ω, the

process can be repeated until xn+1i ∈ Ω. If, however, constrained optimisation prob-

lem has to be converted to unconstrained problem, penalty functions are necessaryfor every dimension j ∈ 1, . . . , D

pj(x, βj) := β2j − (Ui − xi)(xi − Li) (5.45)

where βj ∈ R is a penalty variable. The Lagrangian function is

F (x, λ1, . . . , λD, β1, . . . , βD) := f(x) +

D∑j=1

λj(β2j − (Ui − xi)(xi − Li)

)(5.46)

or introducing

λ := λ1, . . . , λD , (5.47a)

β := β1, . . . , βD , (5.47b)

L := L1, . . . , LD , (5.47c)

U := U1, . . . , UD (5.47d)

we haveF (x,λ,β) := f(x) + λ · (β β − (U− x) (x− L)) . (5.48)

The original D-dimensional constrained optimisation problem is now converted to3D-dimensional unconstrained problem.

70 5. Constraints

5.3 Penalty function method

Similarly to the method of Lagrange multipliers, the penalty function method isused in order to convert the original constrained optimisation problem to a seriesof unconstrained problems. An auxiliary function F is introduced, consisting of theoriginal function f to be minimised subject to inequality constraints hj in the form(5.7) plus the penalty function P

F (x, γk) := f(x) + γkP (x) (5.49)

where γk > 0 is a penalty parameter. Most importantly, the penalty function P isregarded as a measure of infringement of inequality constraints. The assumed formof the penalty function should be zero if x ∈ Ω and nonzero if x /∈ Ω, i.e.,

P (x)

= 0, if x ∈ Ω;

> 0, if x /∈ Ω.(5.50)

One of the possible forms of the penalty function is

P (x) :=

k∑j=1

maxr 0, hj(x) (5.51)

where r ∈ 1, 2, . . .. The converted unconstrained optimisation problem (5.49) isthen solved with the sequence

limk→∞

F (x, γk) = f(x) (5.52)

whose solutions converge to the solution of the original constrained optimisation prob-lem. What is more, the auxiliary function (5.49) still depends on D variables as γk isregarded only as a parameter.

In order to illustrate the method, let us first assume r := 2. Now, we wish tominimise function (5.31) subject to inequality constraint (5.32). Firstly, it is necessaryto formulate the auxiliary function F according to equation (5.49)

F (x, y, γk) := x2 + y2 + γk max20, 1− x. (5.53)

From the necessary condition for optimality ∇F = 0 we have the following twoequations with two unknowns

x− γk max20, 1− x = 0, (5.54a)

y = 0. (5.54b)

By equation (5.54b), this implies that y0 = 0 and by equation (5.54a) we have x −γk(1− x) = 0. The optimal solution x0 is obtained by letting k →∞

x0 = limk→∞

γk1 + γk

= 1. (5.55)

Finally, x0(1, 0).

5.4. Barrier method 71

5.4 Barrier method

Similarly to the penalty function method and method of Lagrange multipliers, barriermethod converts the original constrained optimisation problem to a series of uncon-strained problems. The original function f to be minimised subject to inequalityconstraints hj in the form (5.7) is converted by means of the auxiliary function Fgiven by equation (5.49). This time, however, γk > 0 is a barrier parameter and Pa barrier function preventing the solution leaving the constraint set Ω. The assumedform of the barrier function should tends to infinity as the solution approaches to theboundary of the constraint set. One of the possible forms of the barrier function isthe so called logarithmic barrier function

P (x) := −k∑j=1

lnhj(x). (5.56)

The converted unconstrained optimisation problem

F (x, γk) := f(x)− γkk∑j=1

lnhj(x). (5.57)

is then solved with the sequence k →∞ for γk → 0.In order to illustrate barrier method, let us minimise again function (5.31) subject

to inequality constraint (5.32). The auxiliary function F according to equation (5.57)is

F (x, y, γk) := x2 + y2 − γk ln(1− x). (5.58)

From the necessary condition for optimality ∇F = 0 we have the following twoequations with two unknowns

2x+γk

1− x = 0, (5.59a)

y = 0. (5.59b)

From equation (5.59b) it immediately follows that y0 = 0 and by equation (5.59a) wehave (1− x)2x+ γk = 0. Since x > 1

x = 12 + 1

2

√1 + 2γk. (5.60)

For k →∞ we have γk → 0, thus the optimal solution x0 = 1 and x0(1, 0).

Chapter 6

Variational calculus

6.1 Functional and its variation

Informally, a functional is considered to be a function of functions. More formally,it is a function mapping from a space of functions to the set of real numbers. Forinstance

J [y] =

x2w

x1

F (x, y, y′, y′′) dx (6.1)

is a functional. Having the specific equation y(x) := . . ., it is possible to calculate thevalue of J . Variational calculus involve methods of either minimising or maximisingfunctional similar to (6.1). Consequently, functions that minimise or maximise afunctional are referred to as extremal functions or simply extremals.

The variation δJ of a functional J is given by

δJ :=∂

∂αJ [y + α δy]|α=0 (6.2)

where δy is the variation of the argument y of a functional J , i.e., δy := y1 − y. Thevariation (6.2) is also referred to as the first variation of a functional J .

6.1.1 Necessary condition for an extremum

A minimum or maximum of a functional is called an extremum. The necessary con-dition for an extremum of a functional J requires its variation to be zero, i.e., δJ = 0or by means of equation (6.2)

δJ :=∂

∂αJ [y + α δy]|α=0 = 0. (6.3)

Considering the simplest variational problem of finding an extremum of the func-tional (6.1) and using the above necessary condition (6.3), we have the following

6.1. Functional and its variation 73

integral

x2w

x1

(∂F

∂y− d

dx

∂F

∂y′+

d2

dx2

∂F

∂y′′

)δy dx+

(∂F

∂y′− d

dx

∂F

∂y′′

)δy|x2

x1+∂F

∂y′′δy′|x2

x1= 0.

(6.4)It is now possible to distinguish several problems based on prescribed conditions atthe endpoints xi. At least three cases are possible:• Both ends constrained. For i = 1 and i = 2, the appropriate variations in

equation (6.4) are

δy|xi = 0, (6.5a)

δy′|xi = 0 (6.5b)

meaning that at x1, x2 we have prescribed conditions for y(x1), y(x2) andderivatives y′(x1), y′(x2).

• One end constrained. As above for either i = 1 or i = 2. This also means thatthe second end is variable or unconstrained.

• Mixed problems. For i = 1 or i = 2 we have either

δy′|xi = 0, (6.6a)

δy|xi 6= 0 (6.6b)

or

δy′|xi 6= 0, (6.7a)

δy|xi = 0, (6.7b)

meaning that at x1, we only have prescribed conditions for y(x1). Additionally,at x2 we can have y′(x2) or conversely. This situation is referred to as partlyconstrained end or ends.

6.1.2 The Euler equation

If both ends are constrained, i.e., δy|xi = 0 and δy′|xi = 0, the necessary conditionfor an extremum of a functional J (6.1) is the Euler equation

∂F

∂y− d

dx

∂F

∂y′+

d2

dx2

∂F

∂y′′= 0 (6.8)

where y is a continuously differentiable function. The above equation is also referredto as the Euler-Lagrange equation. In general, equation (6.8) is the fourth orderordinary differential equation. This also means that the specific solution requiresfour boundary conditions y(x1), y(x2), y′(x1), y′(x2). As mentioned previously, thesolutions of the above equation are called extremals.

If the integrand F of the functional (6.1) does not depend on y′′ then it is possibleto provide a simpler form of J

J [y] =

x2w

x1

F (x, y, y′) dx. (6.9)

74 6. Variational calculus

The Euler equation (6.8) is now also simpler

∂F

∂y− d

dx

∂F

∂y′= 0 (6.10)

or explicitly∂F

∂y− ∂

∂x

∂F

∂y′− ∂

∂y

∂F

∂y′y′ − ∂2F

∂y′2y′′ = 0. (6.11)

In general, it is the second order ordinary differential equation. The specific solutionrequires two boundary conditions y(x1), y(x2). Furthermore, if the integrand F ofthe functional (6.9) does not depend on x, namely

J [y] =

x2w

x1

F (y, y′) dx (6.12)

the Euler equation (6.11) can now be integrated once to the Beltrami identity

F − y′ ∂F∂y′

= C (6.13)

where C is an integration constant.The simplest variational problem with functions of two variables is represented by

the following double integral

J [z] =x

Ω

F

(x, y, z,

∂z

∂x,∂z

∂y

)dxdy (6.14)

or using shorter notation (z′x := ∂z∂x and so on)

J [z] =x

Ω

F(x, y, z, z′x, z

′y

)dx dy. (6.15)

The Euler equation, expressing the necessary condition for an extremum of the abovefunctional, is

∂F

∂z− ∂

∂x

∂F

∂z′x− ∂

∂y

∂F

∂z′y= 0. (6.16)

Slightly more complicated variational problem with functions of two variables andhigher order derivatives is

J [z] =x

Ω

F

(x, y, z,

∂z

∂x,∂z

∂y,∂2z

∂x2,∂2z

∂x∂y,∂2z

∂y2

)dx dy (6.17)

orJ [z] =

x

Ω

F(x, y, z, z′x, z

′y, z′′xx, z

′′xy, z

′′yy

)dxdy. (6.18)

The Euler equation is

∂F

∂z− ∂

∂x

∂F

∂z′x− ∂

∂y

∂F

∂z′y+

∂2

∂x2

∂F

∂zxx+

∂2

∂x∂y

∂F

∂zxy+

∂2

∂y2

∂F

∂zyy= 0. (6.19)

6.2. Classic problems 75

6.1.3 Constraints

Different constraints imposed on the specific variational problem can be divided intoseveral categories:• Boundary conditions. If there are no boundary conditions, i.e., no prescribed

conditions at endpoints xi then the problem can be considered as unconstrained.Consequently, the Euler equation (6.8) should be solved with additional condi-tion (

∂F

∂y′− d

dx

∂F

∂y′′

)∣∣∣∣xi

= 0 (6.20)

and∂F

∂y′′

∣∣∣∣xi

= 0. (6.21)

These arise due to integral (6.4). If, however, certain conditions are prescribed atthe endpoint xi then depending on the specific variations δy|xi = 0 or δy′|xi = 0additional conditions (6.21) or (6.20) should be solved together with the Eulerequation.

• Integral constraint. The additional integral constraint is of the form

x2w

x1

G(x, y, y′, y′′) dx = C (6.22)

where G and C are known.• Non-integral constraint. The additional constraint is of the form

G(x, y, y′, y′′) = 0 (6.23)

where G is known.

6.2 Classic problems

6.2.1 Shortest path on a plane

The problem of finding a path on a plane of shortest length connecting two points isillustrated in figure 6.1. If there are no additional constraints imposed on the function(path), the problem is trivial.

The length |l| of a path or, in fact, a smooth curve l is given by the following lineintegral

|l| =w

l

dl. (6.24)

To be more precise, the above integral is the curvilinear integral of a scalar field. Inorder to evaluate the length let us assume first that the path l is given in explicitform, namely l := (x, y) : y(x) = . . . , x ∈ [x1;x2]. Since dl2 = dx2 + dy2 we have

|l| =x2w

x1

√1 + y′2 dx. (6.25)


0 0.2 0.4 0.6 0.8 10

0.5

10 0.2 0.4 0.6 0.8 1

0

0.5

1

Figure 6.1: Shortest path on a plane

It is a functional of the (6.9) type and the integrand F depend only on y′

F (y′) :=√

1 + y′2. (6.26)

The explicit Euler equation (6.11) is

∂2F

∂y′2y′′ = 0. (6.27)

It is either ∂2F∂y′2 = 0 or y′′ = 0. Integrating the latter twice we have

y = C1 + C2x, (6.28)

i.e., a family of lines. The two constants C1, C2 can be determined from the knownboundary conditions (x1, y1), (x2, y2).

6.2.2 Brachistochrone

The problem was originally formulated in 1696 by Bernoulli and solved independentlyby Bernoulli, Newton, de l’Hopital. Brachistochrone is the curve joining two points(0, 0) and (x2, y2) of fastest descent, see figure 6.2. Meaning, that a material pointmoving frictionless under the influence of gravity along that curve, starting from (0, 0),reaches (x2, y2) in the shortest time.

0 0.5 1 1.5

0

0.5

1

x

y

0 0.5 1 1.50

0.5

1

Figure 6.2: Brachistochrone


The velocity of the material point is v = dldt and the time required for descent

along l is

t =w

l

dl

v. (6.29)

In order to evaluate the above line integral let us assume that the path l is givenin explicit form. Since the square of differential element of arc length dl is dl2 =dx2 + dy2 we have

t =

x2w

0

√1 + y′2

vdx. (6.30)

Velocity can be found by means of the law of conservation of energy

mv2

2= mgy, (6.31)

i.e., v =√

2gy. Finally, the time of travel t can be expressed as a certain functionalin the following form

t =1√2g

x2w

0

√1 + y′2

ydx. (6.32)

Moreover, the constant 1/√

2g can be neglected as it has no impact on the opti-mal path l. Consequently, equation (6.32) is a functional of the (6.9) type and theintegrand F does not depend on x

F (y, y′) :=

√1 + y′2

y. (6.33)

0 π 2π0

1

x

y

0 π 2π

0

1

Figure 6.3: Cycloid

The Euler equation (6.11) can now be integrated once to the Beltrami identity(6.13), namely

F − y′ ∂F∂y′

= C. (6.34)

In the case of the brachistochrone problem the integrand (6.33) provides the followingform of the Euler equation√

1 + y′2

y− y′2√

y(1 + y′2)= C. (6.35)


This, however, can be reduced to

y(1 + y′2) = C. (6.36)

Eventually, the solution of the above equation is the cycloid. A cycloid (see figure6.3) is the curve traced by a circular band of radius R (circumference) as the bandrolls along a straight line

x(t) := R(t− sin t), (6.37a)

y(t) := R(1− cos t). (6.37b)

Surprisingly, the extremal is not a straight line.

6.2.3 Minimal surface of revolution

The area |S| of the surface of revolution S by rotating l around the x axis is given bythe following integral

|S| = 2πw

l

y dl. (6.38)

Assuming that the arc l is given in explicit form, namely l := (x, y) : y(x) = . . . , x ∈[x1;x2], it is possible to evaluate the above line integral. Since dl2 = dx2 + dy2 wehave

|S| = 2π

x2w

x1

y√

1 + y′2 dx. (6.39)

Again, equation (6.39) is a functional of the (6.9) type and the integrand F does notdepend on x

F (y, y′) := y√

1 + y′2. (6.40)

0 0.5 1 1.5 20

0.5

1

1.5

2

x

y

0 0.5 1 1.5 2

0

0.5

1

1.5

2

Figure 6.4: Catenary

The Euler equation (6.11) can be integrated once to the Beltrami identity (6.13).This results in

F − y′ ∂F∂y′

= C. (6.41)


In the case of the minimal surface area of revolution the integrand (6.40) provides thefollowing form of the Euler equation

y√

1 + y′2 − yy′2√1 + y′2

= C1. (6.42)

The solution of the above equation is the catenary shown in figure 6.4

y = C1 coshx− C2

C1. (6.43)

Figure 6.5 presents the minimal surface of revolution – catenoid, i.e., the surface ofrevolution by rotating the catenary around the x axis.

−2−1

01

2−4−20

24

−4

−2

0

2

4

xy

z

Figure 6.5: Catenoid

6.2.4 Isoperimetric problem

The problem is to find the maximal encircled area among all closed planar curves ofconstant length (perimeter). Interestingly, the problem is known since antiquity. Forthe sake of simplicity let y1 = y2 = 0, see figure 6.6. The area of the region below lis defined as

|Ω| =x2w

x1

y dx (6.44)

being a functional of the (6.9) type and the integrand F depends only on y, namelyF (y) := y. The isoperimetric condition

|l| =w

l

dl (6.45)

states that the length of curve l := (x, y) : y =?; x1 ≤ x ≤ x2; y1 = y2 = 0 isconstant and equal |l|. According to paragraph 6.1.3 equation (6.45) can be classified


as integral constraint. Since the square of differential element of arc length dl2 =dx2 + dy2 we have

|l| =x2w

x1

√1 + y′2 dx. (6.46)

It is again a functional of the (6.9) type and this time the integrand F1 depends onlyon y′

F1(y′) :=√

1 + y′2. (6.47)

00

(x1, y1) (x2, y2)

x

y

Figure 6.6: Maximal area encircled

In order to find the necessary condition for an extremum (the Euler equation) itis necessary to formulate a new integrand

F2(y, y′) := F + λF1 (6.48)

where λ is a Lagrange multiplier. The functional is

J := |Ω|+ λ|l| (6.49)

or equivalently

J =

x2w

x1

F2 dx =

x2w

x1

(y + λ

√1 + y′2

)dx. (6.50)

Since the integrand F2 does not depend on x, the Euler equation (6.11) can be inte-grated once to the Beltrami identity (6.13)

F2 − y′∂F2

∂y′= C. (6.51)

In the case of the maximal encircled area the integrand F2 provides the following formof the Euler equation

y + λ√

1 + y′2 − λ y′2√1 + y′2

= C1. (6.52)

Furthermore, this can be reduced to

y − C1 =−λ√

1 + y′2. (6.53)


The solution of the above equation is a family of circles

(x− C1)2 + (y − C1)2 = λ2. (6.54)

The three constants, i.e. C1, C2, λ can be easily determined from known boundarypoints (x1, y1), (x2, y2) and the isoperimetric condition (6.46)

6.2.5 Geodesics

The nontrivial problem is to find the shortest path on a surface. Geodesics, in theoriginal sense, are the shortest path between two points on a nonplanar surface S, seefigure 6.7. What is more, this is a constrained variational problem since the unknownpath is constrained to lie on a nonplanar surface.

−4 −20 2 4

−4−20

24

−4

−2

0

2

4

xy

z

Figure 6.7: Geodesics on a sphere

The equation for the length of a curve l is given by the curvilinear integral of ascalar field, namely

|l| =w

l

dl (6.55)

where the curve l itself in space is given by the parametric representation

l := (x, y, z) : x(t) := . . . ; y(t) := . . . ; z(t) := . . . ; t ∈ [t1; t2]. (6.56)

Since the square of differential element of arc length dl is dl2 = dx2 + dy2, it ispossible to evaluate the curvilinear integral (6.55)

|l| =t2w

t1

√x′2 + y′2 + z′2 dt. (6.57)

The integrand F is now

F :=√x′2 + y′2 + z′2. (6.58)


The nonplanar surface is S := (x, y, z) : f(x, y, z) = 0 where f(x, y, z) = 0 is theconstraining equations. In order to find the necessary condition for an extremum weformulate a new integrand

F2 = F − λ(t)f (6.59)

or a new functional

J =

t2w

t1

(√x′2 + y′2 + z′2 − λ(t)f

)dt. (6.60)

The Euler equations are

∂F2

∂x− d

dt

∂F2

∂x′= 0, (6.61a)

∂F2

∂y− d

dt

∂F2

∂y′= 0, (6.61b)

∂F2

∂z− d

dt

∂F2

∂z′= 0 (6.61c)

or

λ(t)∂f

∂x+

d

dt

x′√x′2 + y′2 + z′2

= 0, (6.62a)

λ(t)∂f

∂y+

d

dt

y′√x′2 + y′2 + z′2

= 0, (6.62b)

λ(t)∂f

∂z+

d

dt

z′√x′2 + y′2 + z′2

= 0. (6.62c)

−2−1

01

2−4−20

24

−4

−2

0

2

4

xy

z

Figure 6.8: Geodesics on a cylinder

Let us consider a circular cylinder of radius R. It can be expressed by the followingimplicit equation

f(x, y, z) := x2 + y2 −R2 = 0. (6.63)


The solution of the Euler equations are helices, meaning that the geodesics on acircular cylinder are

x(t) := R cos t, (6.64a)

y(t) := R sin t, (6.64b)

z(t) := k t. (6.64c)

6.2.6 Minimal surface passing through a closed curve in space

The nontrivial problem is to find the surface of least total area stretched across aclosed curve in space. For a planar curve the problem is trivial again.

−4−2

02

4−1

0

1

−1

0

1

xy

z

Figure 6.9: Helicoid

Let there be a nonplanar surface S passing through a given closed curve ∂S andwhose projection onto the xy plane is D, i.e.,

S := (x, y, z) : z(x, y) = . . . ; (x, y) ∈ D . (6.65)

The minimum of the following surface integral of scalar fields

|S| =x

S

dS (6.66)

is the solution to the problem of the minimal surface in space. Alternatively, theabove functional can be expressed by means of the double integral

|S| =x

D

√1 + z′2x + z′2y dxdy. (6.67)

It is a functional of the (6.15) type and the Euler equations (6.16) can now be reducedto

∂

∂x

z′x√1 + z′2x + z′2y

+∂

∂y

z′y√1 + z′2x + z′2y

= 0 (6.68)


or explicitly

∂2z

∂x2

(1 +

(∂z

∂y

)2)− 2

∂z

∂x

∂z

∂y

∂2z

∂x∂y+∂2z

∂y2

(1 +

(∂z

∂x

)2)

= 0. (6.69)

In order to find the minimum surface S one has to solve the above nonlinear, secondorder partial differential equation. Obviously, a planar surface is also a solution ofequation (6.69). Furthermore, the nontrivial minimal surfaces are catenoid (figure6.5) and helicoid (figure 6.9).

6.2.7 Variational formulationof elliptic partial differential equations

The problem of finding the minimum of the functional satisfying certain boundarycondition over ∂Ω

J [z] =x

Ω

12

(z′2x + z′2y

)dx dy (6.70)

leads to the Laplace equation

∂2z

∂x2+∂2z

∂y2= 0. (6.71)

The harmonic functions, being the solution of equation (6.71), are extremals of thefunctional (6.70). Furthermore, the necessary condition for an extremum of the fol-lowing functional

J [z] =x

Ω

(z′2x + z′2y − 2zf(x, y)

)dxdy (6.72)

results in the Poisson equation

∂2z

∂x2+∂2z

∂y2= f(x, y). (6.73)

Finally, the problem of finding the minimum of the functional

J [z] =x

Ω

(z′′2xx + 2z′′2xy + z′′2yy − 2zf(x, y)

)dxdy (6.74)

leads to the biharmonic equation

∂4z

∂x4+

∂4z

∂x2∂y2+∂4z

∂y4= f(x, y). (6.75)

Variational formulation of elliptic partial differential equations forms the foundationsfor the various approximate method.

6.3. Variational method of finding streamlines in ring cascades for creeping flows 85

6.3 Variational method of finding streamlines in ringcascades for creeping flows

6.3.1 Introduction

Creeping, steady state flow is considered here together with the additional assumptionof axial symmetry. Creeping flow occurs when the Reynolds number Re 1. Thiscondition, however, is not satisfied for typical technical applications in cascade flows.Therefore, one has to keep in mind that methods presented here are mostly of cognitivevalues. A very important feature of creeping flows is worth mentioning here, i.e., theyare characterised by the minimum possible dissipation.

γ1

|γ1 −

γ2 |

−β

−α−γ2

τ

R1

R2

Figure 6.10: Ring cascade scheme

6.3.2 Conservation equation in curvilinear coordinate systems

Because of the shape of the cascade, see figure 6.10, it is most convenient to expressthe conservation equations in a coordinate system in which they arrive in the simplestform. The mass conservation equation for the incompressible case does not simplify,however, under the Re 1 assumption.

Introducing the incompressibility and constant viscosity µ assumptions as wellas steady state character of the flow, the mass conservation equation and the twocomponents of the Stokes equations in cylindrical coordinates (or polar on a plane)read

∂

∂r(rUr) +

∂Uϕ∂ϕ

= 0, (6.76a)

1

µ

∂p

∂r=

1

r

∂

∂r

(r∂Ur∂r

)+

1

r2

∂2Ur∂ϕ2

− Urr2− 2

r2

∂Uϕ∂ϕ

, (6.76b)

1

µ

∂p

∂ϕ=

∂

∂r

(r∂Uϕ∂r

)+

1

r

∂2Uϕ∂ϕ2

− Uϕr

+2

r

∂Ur∂ϕ

. (6.76c)


The above system (6.76) is closed. The unknown functions are the velocity compo-nents Ur, Uϕ and pressure p. The uniqueness of this system with prescribed boundaryconditions was first proved by Helmholtz.

The concept of the stream function ψ can be introduced according to the followingdefinitions Ur = 1

r∂ψ∂ϕ and Uϕ = −∂ψ∂r . The alternative definitions Ur = − 1

r∂ψ∂ϕ and

Uϕ = ∂ψ∂r are also possible. Both definitions satisfy the mass conservation equation

(6.76a). After differentiating equation (6.76b) with respect to ϕ and (6.76c) with rand subsequent subtracting one from the other, we obtain

1

r

∂

∂r

(r∂

∂r

(1

r

∂

∂r

(r∂ψ

∂r

)))+

2

r2

∂4ψ

∂r2∂ϕ2+

1

r4

∂4ψ

∂ϕ4− 2

r3

∂3ψ

∂r∂ϕ2+

4

r4

∂2ψ

∂ϕ2= 0, (6.77)

The above equation is the so called biharmonic equation in polar coordinates. Ashorter version of this equation reads ∇4ψ = 0. It must be point out that althoughequation (6.77) corresponds to the system (6.76), it is of fourth order.

6.3.3 Dissipation function and dissipation power

The strain rate tensor in polar, physical coordinates, takes the form

D =

(∂Ur∂r

12∂Uϕ∂r + 1

2r∂Ur∂ϕ −

Uϕ2r

12∂Uϕ∂r + 1

2r∂Ur∂ϕ −

Uϕ2r

1r∂Uϕ∂ϕ + Ur

r

). (6.78)

The same tensor expressed in terms of stream function ψ reads

D =

(1r∂2ψ∂r∂ϕ − 1

r2∂ψ∂ϕ

12r2

∂2ψ∂ϕ2 + 1

2r∂ψ∂r − 1

2∂2ψ∂r2

12r2

∂2ψ∂ϕ2 + 1

2r∂ψ∂r − 1

2∂2ψ∂r2

1r2∂ψ∂ϕ − 1

r∂2ψ∂r∂ϕ

). (6.79)

By means of this tensor it is possible to express the dissipation function φµ = 2µD2

as

φµ =µ

r4

(4

(∂ψ

∂ϕ− r ∂

2ψ

∂r∂ϕ

)2

+

(∂2ψ

∂ϕ2+ r

(∂ψ

∂r− r ∂

2ψ

∂r2

))2). (6.80)

The dissipated power is defined as

Nd =x

Ω

φµr dr dϕ, (6.81)

where the considered flow domain Ω is the following subset of the plane Ω := (r, ϕ) :r ∈ [R1, R2];ϕ ∈ [0, τ ].

6.3.4 Analytical solutions

An analytical solution of the system (6.76) for an axially symmetric geometry ispossible. This case can also be regarded as a cascade composed of an infinite numberof infinitely thin blades. Formally, it is the case where all the streamlines are identicalwith respect to rotation around the symmetry axis. From this arises an additionalassumption, i.e., ∂

∂ϕ = 0.


It may be shown that for axial symmetry, there exists a solution of the system(6.76). This system simplifies now to

d

dr(rUr) = 0, (6.82a)

1

µ

dp

dr=

1

r

d

dr

(r

dUrdr

)− Urr2, (6.82b)

0 = rd2Uϕdr2

+dUϕdr− Uϕ

r. (6.82c)

We are dealing with ordinary differential equations. The first of them, i.e., (6.82a)can be integrated and gives the analytical solution Ur = c1r

−1. This solution canbe substituted into equation (6.82b). This results in dp

dr = 0, which means thatpressure is constant inside the entire flow domain p = c2. The last equation (6.82c)is simply an ordinary differential equation in terms of Uϕ. Its solution takes the formUϕ = c3r

−1 + c4r. Finally, the system (6.82) is integrated to

Ur =c1r, (6.83a)

p = c2, (6.83b)

Uϕ =c3r

+ c4r. (6.83c)

In view of the axial symmetry, the biharmonic equation ∇4ψ = 0 (6.77) simplifiesto

1

r

d

dr

(r

d

dr

(1

r

d

dr

(r

dψ

dr

)))= 0. (6.84)

This is also the case with the strain rate tensor (6.79), which takes the following form

D =

(∂Ur∂r

12∂Uϕ∂r −

Uϕ2r

12∂Uϕ∂r −

Uϕ2r

Urr

). (6.85)

Following the same line of reasoning, the dissipation function (6.80) simplifies to

φµ =µ

r4

(2

(∂ψ

∂ϕ

)2

+ 2

(∂ψ

∂ϕ− r ∂

2ψ

∂r∂ϕ

)2

+ r2

(∂ψ

∂r− r ∂

2ψ

∂r2

)2). (6.86)

6.3.5 Dissipation functional

The assumption of axial symmetry results in a set of identical streamlines f whichdepend only on the coordinate r. The following form of stream function ψ may beproposed

ψ(r, ϕ) :=ϕ− f(r)

τ. (6.87)

It cannot be determine whether this function satisfies the biharmonic equation (6.84),since the function f is unknown. The problem is now reduced to the search for the


single variable function f instead of two-variable function ψ. The form of ψ (6.87) isfully determined by f .

The dissipation function (6.86) or (6.80) takes the following form by virtue of(6.87)

φµ =µ

r4τ2

(4 + r2 (f ′ − rf ′′)2

). (6.88)

The dissipation power (6.81) may now be rewritten as an iterated integral for anypitch τ (see figure 6.10)

Nd =

τw

0

R2w

R1

φµr dr dϕ. (6.89)

What is important, is that the form (6.87) allows us to integrate the dissipation poweronce. This is because it explicitly depends on ϕ. On the basis of equation (6.88) and(6.89), we have

Nd =µ

τ

R2w

R1

1

r3

(4 + r2 (f ′ − rf ′′)2

)dr. (6.90)

The above integral is a certain functional which depends on the radius r and thefunction f together with its derivatives (up to the second). Symbolically, it is writtenas

N [f ] =

R2w

R1

F (r, f, f ′, f ′′) dr. (6.91)

The necessary condition for the optimum of this functional, in the general case withunconstrained ends, takes the form (6.8). Therefore the optimisation problem consistsin the search for a streamline f which minimises the functional (6.91). The form of thefunction f results from the necessary condition (6.8). This condition can be simpler,if certain additional assumptions are introduced. This is discussed later.

6.3.6 Dissipation functional vs. equations of motion

The method presented here consists in choosing the function f (streamlines) whichwould minimise the functional (6.90). However, the essential question is whetherthe solution obtained by minimising the functional satisfies the equations of motion(6.82). In order to answer this question, we need the functional which yields the Stokesequations as a result of a necessary condition. The general form of this functional is

J =x

Ω

(ρ∂U

∂t·U− ρg ·U− p∇ ·U + µD2

)dΩ. (6.92)

The necessary condition δJ = 0 yields the Stokes equations ρ∂U∂t = ρg−∇p+µ∇2U.

In this case we deal with steady state flow ∂∂t = 0 and we neglect mass forces. If so,

then the functional (6.92) simplifies to

J =x

Ω

(−p∇ ·U + µD2

)dΩ. (6.93)


From the necessary condition we obtain equation of motion in absolute notation ∇p =µ∇2U. We also know from solution (6.83) of the system (6.82) that pressure isconstant and therefore ∇p = 0 and J =

sΩµD2 dΩ. This means that 2J = Nd where

Nd is defined by means of (6.88)–(6.90). This simply guaranties that the minimisationof the dissipation functional Nd, which yields the streamlines f , leads to a solutionthat satisfies the Stokes equation (for a constant pressure). Additionally, one maytake under consideration only the cases with one end unconstrained and with bothends partly constrained (when angles are known). This will be discussed further.The above reasoning does not apply to the Navier-Stokes equations, since they arenonlinear and there is no classical variational formulation such as (6.92). However,there is a non-classical variational formulation which can be used for the nonlinearNavier-Stokes equations. This also means that dissipation is not the only componentof the functional and there is no guarantee that the streamlines f , which arise fromthe minimisation of the functional, satisfy the equations of motion.

6.3.7 Streamlines

6.3.7.1 Both ends constrained

Both ends constrained means that the angle α and position γ2 are known at the inletand the angle β and position γ1 (figure 6.10) are known at the outlet. From thenecessary condition (6.8) we obtain the Euler equation in the following form

∂F

∂f− d

dr

∂F

∂f ′+

d2

dr2

∂F

∂f ′′= 0. (6.94)

Since both ends are constrained, so the appropriate variations δf |Ri = 0 and δf ′|Ri =0. From the Euler equation (6.94) for functional F , we obtain an ordinary differentialequation of the fourth order

f ′

r3− f ′′

r2+

2f ′′′

r+ f IV = 0. (6.95)

This equation should be solved together with the following boundary conditionsf(R1) = γ1, f(R2) = γ2, f ′(R1) = tanβ, f ′(R2) = tanα. The general solutionof equation (6.95) is the function f (streamline)

f(r) := C1 + C2r2 + C3 ln r + C4r

2 ln r. (6.96)

It can be easily verified that the solution (6.96) satisfies the biharmonic equation(6.84). After calculating the stream function (6.87), velocities Ur, Uϕ and pressure p,it arises that the second equation of motion (6.82b) gives 0 = 4C4τ

−1. This meansthat the pressure does not satisfies the axial symmetry condition and the problemwith both ends constrained it too general (too stiff). In addition, all the followingsolutions must satisfy the condition C4 = 0. Only then, the axial symmetry conditionis satisfied for all the variables. Finally, the most general form of the solution ofequation (6.95) has the form

f(r) := C1 + C2r2 + C3 ln r. (6.97)


6.3.7.2 One end partly constrained

Two cases are possible. In the first one we know of the one of angles α or β. In thesecond, we know position γ1 or γ2.

Known angle. To be more precise, we know both angles: the inlet angle α andthe outlet angle β. We look for one of the positions γi. This requires δf ′|Ri = 0 andδf |R1

6= 0 or δf |R26= 0. From the necessary condition (6.4) we obtain an additional

equation (∂F

∂f ′− d

dr

∂F

∂f ′′

)∣∣∣∣r=Ri

= 0. (6.98)

The solution must satisfy this condition together with the Euler equation (6.94). Itcan be shown that the additional condition (6.98) for the functional F can be reducedto −4C4r

−2∣∣Ri

= 0 which yields C4 = 0. Therefore, the solution (6.96) takes the

form (6.97). This means that the problem with one end partly constrained (withconstrained position) is well formulated.

The known position serves as a reference point and its value has no significance,owing to the axial symmetry of the function f . The boundary conditions take theform of f ′(R1) = tanβ, f ′(R2) = tanα. For a matter of simplicity, the additionalreference point may be assumed as f(R2) = 0. In so doing we deal with two partlyconstrained ends (constrained position). The solution of equation (6.97) together withthe discussed boundary condition has the form

f(r) :=2R1R2 (R1 tanα−R2 tanβ) ln r

R2−(r2 −R2

2

)(R2 tanα−R1 tanβ)

2 (R21 −R2

2).

(6.99)By using formula (6.87) and the definition of the stream function it can be shown thatvelocity Ur = τ−1r−1. This means that the constant c1 in equation (6.83a) equalsc1 = τ−1. The velocity is then

Uϕ =1

τ (R21 −R2

2)

(r (R1 tanβ −R2 tanα) +

R1R2

r(R1 tanα−R2 tanβ)

),

(6.100)which means that constants in equation (6.83c) takes the form

c3 =R1R2 (R1 tanα−R2 tanβ)

τ (R21 −R2

2), (6.101a)

c4 =R1 tanβ −R2 tanα

τ (R21 −R2

2). (6.101b)

Exemplary streamlines, obtained from equation (6.99), are shown in figure 6.11, whereα = −80. The outlet angles vary from −80 to 80.

Known position. More precisely, both positions are known: the inlet γ2 and theoutlet γ1. We look for either the inlet angle α or the outlet angle β. This requires that


variations δf |Ri = 0 and δf ′|R16= 0 or δf ′|R2

6= 0. From the necessary condition(6.4) we obtain an additional equation in the following form

∂F

∂f ′′

∣∣∣∣r=Ri

= 0, (6.102)

which must be satisfied together with the Euler equation (6.94). The additionalequation (6.102) for the functional F simplifies to C3 = C4r

2∣∣Ri

. This lead to thefollowing form of the streamline

f(r) := C1 + C2r2 + C4

(R2i + r2

)ln r. (6.103)

The above solution does not posses the admissible form (6.97). This means thatpressure is not axially symmetric. Therefore, the case with one end partially con-strained (in the form of known angle) is too stiff and hence not well formulated(C3 = C4R

2i 6= 0).

−2 −1 0 1 2−2

−1

0

1

2

x

y

−2 −1 0 1 2

−2

−1

0

1

2

Figure 6.11: Streamlines as a function ofβ for α = −80 for R2

R1= 4

−2 −1 0 1 2−2

−1

0

1

2

x

y−2 −1 0 1 2

−2

−1

0

1

2

Figure 6.12: Streamlines as a function ofinlet angle α for R2

R1= 4

6.3.7.3 One end unconstrained

Here we know either the inlet position γ2 and the inlet angle α or the outlet positionγ1 together with the outlet angle β. This requires that variations δf ′|Ri = 0 andδf |Ri 6= 0. Apart from the Euler equations (6.95), additional conditions (6.98) and(6.102) must be fulfilled. This is the combination of the two previously discussedcases, where C3 = C4R

2i = 0. From equation (6.96) follows the general solution

f(r) := C1 + C2r2. (6.104)

The specific solution of (6.104) must satisfy the following boundary conditions f(Ri) =γi, f

′(Ri) = tan∠ where Ri ∈ R1, R2, ∠ ∈ α, β. From this conditions we obtainthe specific solution

f(r) := γi +r2 −R2

i

2Ritan∠. (6.105)


The solution is valid both for the unconstrained inlet and the unconstrained outlet.From equation (6.87) and the definition of stream function, it follows that velocityUr = τ−1r−1, which means that c1 in equation (6.83a) c1 = τ−1. The velocityUϕ = rR−1

i τ−1 tan∠, which means that constants in teh solution (6.83c) take theform c3 = 0 i c4 = R−1

i τ−1 tan∠.

The dissipation power as a function of the inlet or outlet angle can be calculatedon the basis of equations (6.90) and (6.105). In both cases, this power is constantand for a pitch τ = 2π it equals

Nd =µ

π

(1

R21

− 1

R22

). (6.106)

The streamlines corresponding to the solution (6.105) are shown in figure 6.12. Thisis the case with the unconstrained outlet. The shortest streamlines are obtained forthe angles 0. The larger the angle the longer the wraparound angle.

6.3.8 Summary

It is possible to find an analytical solution of the Stokes equation for an axially sym-metric geometry in terms of velocity field. This can be done by direct integrationof the system (6.82). Furthermore, it is even possible to find a solution of the bi-harmonic equation (6.84) using the proposed decomposition of the stream function(6.87). Basing on these solutions, one cannot determine whether further relaxations ofthe inlet and outlet are possible, since there are no additional conditions that can beimposed on the solution. A far more general method is presented here that allows toovercome these difficulties. This method consists in the minimisation of a dissipationfunctional by means of the variational calculus. This allows to formulate additionalconditions to be imposed on the solution. Moreover, this method allows to obtainfurther solutions depending on how the inlet and the outlet are fixed and to find thesolutions which are too stiff.

6.4 Minimum drag shape bodies moving in inviscidfluid

6.4.1 Problem formulation

Figure 6.13 presents a moving object is steady, inviscid and incompressible fluid at aconstant speed U . The resistance is considered only on the peripheral of a moving ob-ject. Two cases are considered here, namely two-dimensional and three-dimensional.As for the former, the shape of the body is symmetrical with respect to the x axis,while in the latter one deals with an axial-symmetry with respect to the same axis x.

6.4. Minimum drag shape bodies moving in inviscid fluid 93

y

x

y0

x0

y(x) :=? (x0, y0)

U

Un

αα

Figure 6.13: Moving object

6.4.2 Fluid Resistance

6.4.2.1 Drag force

Drag force is the components, directed towards the body velocity, of the total forceR exerted on the moving body by the fluid. Formally, the total force is defined bymeans of a surface integral of stress vector over a considered body’s surface S. In theabsence of viscous (tangential) stresses this can be expressed by means of the normalstresses contribution

R = −x

S

(p− p∞)n dS. (6.107)

For further consideration it is necessary to specify a unit normal vector n to thesurface S. If the surface S is given explicitly S := (x, y, z) : z = f(x, y) or implicitlyS := (x, y, z) : F (x, y, z) := f(x, y)− z = 0 the unit normal vector is

n = − ∇F|∇F | =(−zx,−zy, 1)√

1 + z2x + z2

y

. (6.108)

The above formula takes simpler form for both: two-dimensional and three-dimensionalaxisymmetric case

n =(−y′, 1)√

1 + y′2, (6.109)

where the unit normal vector n applies to the curve l which is given by l := (x, y) :y = f(x) or l := (x, y) : F (x, y) := f(x)− y = 0.

6.4.2.2 Pressure coefficients and its approximation

In order to determine drag force it is necessary to know the distribution of pressuredifference p− p∞. The pressure may be determined when, for instance, the pressurecoefficient distribution cp is known

cp :=p− p∞12ρU

2∞. (6.110)

The Newtonian approximation for the distribution of pressure coefficient is given bycp = 2 sin2 α and together with definition (6.110) yields p − p∞ = ρU2

∞ sin2 α. This


makes it possible to determine the optimal shape of a body moving in inviscid fluidin the sense of minimum drag.

6.4.3 Two-dimensional problem

In the case of two-dimensional flows equation (6.107) is reduced to the following form

R := −w

l

(p− p∞)n dL = −ρU2∞

w

l

n sin2 α dL, (6.111)

where the curve l starts from (0, 0) and ends at (x0, y0), see figure 6.13. From thesame figure it arises another geometrical relation, namely

sinα =dy√

dx2 + dy2=

y′√1 + y′2

. (6.112)

In order to convert curvilinear integral to single integral it is necessary to take ad-vantage of arc differential dl =

√1 + y′2 dx.

0 0.2 0.4 0.6 0.8 1−1

−0.5

0

0.5

1

x+

y+

0 0.2 0.4 0.6 0.8 1

−1

−0.5

0

0.5

1

x0.46

x

x0.75

Bezier

Figure 6.14: Optimal shapes

Drag force Rx comes directly from equations (6.109) and (6.111)

Rx := ρU2∞

x0w

0

y′3

1 + y′2dx (6.113)

and can be interpreted as a certain functional J . The specific value of that func-tional depends on a curve of interest y. A constant value ρU2

∞ is regarded here as amultiplier. This leads to the following form of a functional J

J [y] :=

x0w

0

y′3

1 + y′2dx =

x0w

0

F (y, y′) dx. (6.114)

The necessary condition for the optimum of the above functional J results in theEuler equation

Fy −d

dxFy′ = 0. (6.115)


From the Euler equation (6.115) for the functional J , we obtain an ordinary differentialequation y′y′′(y′2− 3) = 0. There are four solutions to this equation. The first trivialsolution y = C1 does not fulfil boundary conditions. For a specific case when C1 = 0and y(x) := 0 one obtains a solution characterised by zero drag. The second andthird solutions, namely y(x) := C1 ±

√3x, do not fulfil one boundary condition in

general. For instance, if y(0) = 0 then C1 = 0 and it is typically impossible to fulfily(x0) = y0. The fourth solution y(x) := C1 +C2x satisfies both boundary conditionsy(0) = 0, y(x0) = y0. Constants are determined to be C1 = 0, C2 = y0x

−10 . This

results in the following solutiony

y0=

x

x0. (6.116)

Introducing dimensionless variables x+ := xx−10 and y+ := y y−1

0 we have somewhatsimpler form y+ = x+. The above solutions is shown in figure 6.14. It is simplya straight line. Furthermore, an isosceles triangle is a two-dimensional body of aminimum drag.

6.4.4 Three-dimensional problem

6.4.4.1 Functional and Euler equation

In the case of two-dimensional surfaces S of revolutions equation (6.107) is reducedto the following form

R := −2πw

l

(p− p∞)ny dL (6.117)

where the curve l is peripheral of surface S. Equation (6.117) is valid for axisymmetricproblems. This results in different drag force equation Rx in comparison with equation(6.113)

Rx := 2πρU2∞

x0w

0

y′3y1 + y′2

dx. (6.118)

Neglecting constant multiplier 2πρU2∞ makes it now possible to express the functional

J as

J [y] :=

x0w

0

y′3y1 + y′2

dx. (6.119)

Following the same line of reasoning we have the necessary condition for the optimumof the above functional J . This results in the same Euler equation (6.115). This time,however, we obtain slightly more complicated ordinary differential equation

y′(y y′′

(y′2 − 3

)− y′2 − y′4

)= 0. (6.120)

This is an obvious consequence of a more complex form of a functional J . Solutionto this equation has to fulfil the same boundary conditions as previously y(0) = 0,y(x0) = y0.


6.4.4.2 Exact pseudo solution

A solution to nonlinear equation (6.120) may not be unique. It can, however, beintegrated once setting simultaneously y′ = u(y) and y′′ = u′u. This leads to anothernonlinear equation C1y y

′3 = (1 + y′2)2 this time of the first order. This equation canbe classified as Lagrange equation and the parametric solution is

x(p) :=1

C1

(3

4p4+

1

p2ln p+ C2

), (6.121a)

y(p) :=

(1 + p2

)2C1p3

, (6.121b)

where p := y′ is a parameter and derivative at the same time. It can be easilydemonstrated keeping in mind that dy

dx = dydp/

dxdp = p. Unknown constant in equation

(6.121) should satisfy boundary conditions. Solution (6.121) is of no practical value.This is because of nonlinear and non-unique character of equation (6.120). It possibleto find the optimal solution only within the range of x+ ∈ [0.1, 1]. Consequently, it isimpossible to find the most interesting part of the solution around x+ ≈ 0.

6.4.4.3 Approximate solution due to the functional

The differential equation for the functional (6.120) has a complicated form. Thismeans that it is extremely difficult to give an explicit solution. The classic approach tothis problem is to simplify the form of a functional (6.119). It is assumed that y′2 1and hence y′2 + 1 ≈ 1. This assumption is not true when x+ ≈ 0 where one wouldexpect y′(x) → ∞ as x → 0+. This is because of smoothness of the axisymmetricsolution. However, far from the point of stagnation the discussed simplification isjustified. If so, then the form of functional (6.119) can be simplified

J [y] :=

x0w

0

y′3y dx. (6.122)

The necessary condition for the optimum of the above functional J results in simplerEuler equation. y′(y′2 + 3y y′′) = 0. There are two solutions to this equation. Thefirst trivial solution y = C1 does not fulfil boundary conditions. The second solutionis y(x) := C2(4x − 3C1)

34 . Applying boundary conditions y(0) = 0, y(x0) = y0 we

havey

y0=

(x

x0

) 34

. (6.123)

The same in dimensionless variables yields y+ = (x+)34 . Solution of (6.123) has the

form of a parabola and it is shown in figure 6.14.

6.4.4.4 Approximate solution due to form of the function

It can be easily verified that following curve y+ = (x+)12 gives even smaller value of the

original functional (6.119) in comparison with y+ = (x+)34 that minimises functional


(6.122). This suggest that instead of simplified version of functional (6.119) one canconsider certain class of functions. The natural candidate is

y+ =(x+)n. (6.124)

The unknown exponent n should be n ∈]0,+∞[. In spite of appearances, this is afairly wide range of solutions, which can take the form of from a thin picket to almosta tube. The class of functions (6.124) satisfies boundary conditions in dimensionlessform y+(0) = 0, y+(1) = 1.

Known form of the function (6.124) allows to transform the variational problem tothe classic problem of function optimisation. One looks for the optimal value of theexponent n. This method is somewhat similar to the Ritz method. Its approximatenature is in the fact that the family of assumed function (6.124) does not have toincorporate the exact solution of a functional (6.119). Functional (6.119) now takesthe following form

J+ :=

1w

0

n3(x+)4n−3

1 + n2(x+)2n−2dx+ (6.125)

and can be only integrated numerically. Figures 6.15 presents values of functionalJ+ as a function of exponent n according to equation (6.125). It is now apparentthat exponent 3

4 , being an optimal solution of simplified functional (6.122), is notthe best solution when it comes to original functional (6.119). The optimal exponentwithin the class of functions (6.124) is n ≈ 0.46. This leads to the optimal parabolay+ = (x+)0.46 shown in figure 6.14.

0 0.2 0.4 0.6 0.8 10.2

0.25

0.3

0.35

0.4

n

J+

0 0.2 0.4 0.6 0.8 1

0.2

0.25

0.3

0.35

0.4

Figure 6.15: J+ values as a function of the exponent n

6.4.4.5 Approximate solution by means of a Bezier curve

Another approach to minimisation of functional (6.119) is to discretise the variationalproblem by means of Bezier curves. This means that the original problem f : C2

[0;1] →R is now reduced to single variable optimisation f : RD → R where

C2[0;1] := f : f, f ′, f ′′ : [0; 1]→ R are continuous . (6.126)

Figure 6.16 shows an example of Bezier curve described by means of five points. Firstand last point are fixed as well as the x coordinate of the second point. The former


geometrical constrain is necessary in order to keep the surface of revolution smooth.The assumed geometrical constraints results in five independent variables, i.e., D = 5.

The objective function is subjected to box constraints

Ω :=x ∈ RD : Li ≤ xi ≤ Ui

. (6.127)

Differential Evolution was chosen in order to solve the optimisation problem. Uniformrandom initialisation within the search space with random seed based on time wasconsidered. The algorithms stopped when the number of generations nmax = 30 wasreached. The total number of solution per generation (population size) was N = 20.The scale parameter F of DE and the crossover probability C are listed in table6.1 together with other optimisation parameters such as lower L and upper U boxconstraints.

Table 6.1: Basic parameters of DE

Value

D 5N 20nmax 30F 0.9C 0.7L (0, 0.01, 0, 0.5, 0)U (1, 0.5, 1, 1, 1)

The optimal Bezier curve, resulting in the lowest value of functional J , is shownin figure 6.16 which can be compared with other solutions in figure 6.14.

0 0.2 0.4 0.6 0.8 10

0.25

0.5

0.75

1

x+

y+

0 0.2 0.4 0.6 0.8 1

0

0.25

0.5

0.75

1

Figure 6.16: Optimal Bezier curve

6.4.5 Summary

Two new approaches and solutions to minimum drag shape body problem are in-troduced. Both transform the original variational problem to the classic problem offunction optimisation. First approach is possible due to assumption of certain class


of function, namely power law shapes. This makes it possible to find the optimalvalue of the exponent n = 0.46 being better than the classic n = 3

4 . Even better solu-tion can be accomplished by means of a Bezier curve leading to discretised functionaloptimisation.

All solutions can be compared and ordered by means of the drag coefficient

cd =Rx

12ρU

2∞A(6.128)

where A = πy20 is the reference area. Table 6.2 presents the drag coefficient ratios

where the reference drag cdc has been calculated for cone. It is clear that the classicpower law shape with the exponent n = 3

4 is one of the worst. Modern global op-timisation methods such as DE can produce better solutions characterised by lowerdrag.

Table 6.2: Drag coefficients ratios

Curve cdcdc

y+ = x+ 1.000

y+ = (x+)34 0.880

y+ = (x+)12 0.805

y+ = (x+)0.46 0.803Bezier 0.769

Chapter 7

Multi-objective optimisation

Multi-objective optimisation problems rely on simultaneous minimisation or maximi-sation more than one objective function. In the case of single objective optimisationwe deal with only one function. Typically, real problems are multi-objective. Multi-objective optimisation gives as a results set of solution whereas single objective givesonly one.

7.1 Definitions

The vector f of n scalar functions fi is denoted as f := (f1, . . . , fn). If a point xin D-dimensional space RD is denoted by x := (x1, . . . , xD) then the n-dimensionalobjective fitness function value is

f(x) := (f1(x), . . . , fn(x)). (7.1)

From this it arises that the function f maps from D- to n-dimensional space

f : RD → Rn. (7.2)

Individual fitness functions fi, being components of f , map

fi : RD → R. (7.3)

7.2 Domination

Let us also introduce a subset N of natural numbers set N := 1, . . . , n ⊆ N of nnumbers.

Domination is a key concept for multi-objective optimisation. We say in the caseof minimisation of function f that point x1 dominates over point (or solution) x2 if

∀i∈N

fi(x1) ≤ fi(x2) ∧ ∃i∈N

fi(x1) < fi(x2) (7.4)

where x1,x2 ∈ Ω ⊆ RD. Here the set of all admissible solutions is denoted as Ω.

7.3. Scalarisation 101

7.2.1 The Pareto set

We say, by contradiction of definition (7.4), that x1 does not dominate (for minimi-sation) over x2 (or x2 is not dominated by x1) if

∃i∈N

fi(x1) > fi(x2) ∨ ∀i∈N

fi(x1) ≥ fi(x2). (7.5)

The Pareto set Π of the solution is a set of those points which are not dominatedby others from the set of admissible solutions Ω. This can be denoted as

Π :=

xj ∈ Ω : ∀

x∈Ω

(∃i∈N

fi(x) > fi(xj) ∨ ∀i∈N

fi(x) ≥ fi(xj))

. (7.6)

7.2.2 The Pareto front

The Pareto set Π is a set of points which are not dominated. The Pareto front P is aset of values with coordinates corresponding to Pareto set elements. It is denoted as

P := f(x) ∈ Rn : x ∈ Π. (7.7)

7.3 Scalarisation

The basic idea relies on reducing (scalarising) a multi-objective optimisation function

f : RD → Rn (7.8)

to a single objective functionf : RD → R (7.9)

in order to use standard optimisation methods. Here a few popular methods arediscus briefly:• method of weighted-sum,• method of target vector,• method of minimax.

7.3.1 Method of weighted-sum

The most popular method is the method of weighted-sum. The n-dimensional vectorof weights w := (w1, . . . , wn) is composed of individual weights wi ∈ [0, 1] which canbe optionally selected, provided that

n∑i=1

wi = 1. (7.10)

Values of individual weights represent the importance of a given function fi. Functionf is obtained from f by the dot product of function f and vector of weights w in theform of

f(x) := w · f(x) =

n∑i=1

wifi(x). (7.11)

102 7. Multi-objective optimisation

Proper selection of weights produces convergence for individual elements of the Paretoset.

7.3.2 Method of target vector

There are also target vector g := (g1, . . . , gn) or minimax methods. Both methodsreduce the function f to f by means of

f(x) :=∥∥(f(x)− g) ·W−1

∥∥α. (7.12)

The symbol W represents a matrix of weights and it is usually diagonal of size n×n.Vector g represents imaginary optimal values to where an algorithms tries to converge.

For the target vector method the norm ‖·‖α is replaced by the generalised Eu-clidean one in the form

‖a‖α :=

(n∑i=1

|ai|α)1/α

. (7.13)

Usually α := 2 and W := δ where δ represents the Kronecker delta. Equation (7.12)reduces to

f(x) :=

√√√√ n∑i=1

(fi(x)− gi)2. (7.14)

7.3.3 Method of minimax

It is a special case of the target vector method. For the minimax method we take socalled ‘maximum’ norm

‖a‖α := maxi∈N

ai. (7.15)

Reducing the matrix W to diagonal form we obtain

f(x) := maxi∈N

fi(x)− giwi

. (7.16)

This methods relies on minimisation of maximum norm of difference between objectivefitness function and target vector. Results are similar to those from the target vectormethod.

7.4 SPEA

The algorithm SPEA [33] maintains dominated solutions in a separate set which isPareto set Πi. New solutions (if any) are added to this set in each generation fromthe current population Pi which are not dominated by others from this populationof solutions Pi. Subsequently, dominated solutions (if they exist) are excluded fromPareto set Πi which could change their status due to a new solution added to it fromPi.

The selection process of solutions is similar to an ordinary genetic algorithm withthe difference that parents are selected from the current ith population Pi which is

7.5. Examples 103

increased by individuals from Pareto set Πi. Crossover and mutation have identicalform as in an ordinary genetic algorithm.

The single value fitness function of an individual, which represents its adaptation,is calculated differently for Πi and Pi. For a Pareto set Πi the fitness function for thejth individual has the form

f(xj) :=|Zi||Pi|+ 1

∈ [0, 1[ (7.17)

where |Zi| denotes the number of elements of individuals from Pi which are dominatedby jth individual xj from Πi. For an individual xk from population Pi we have

f(xk) := |Di|+ 1 ∈ [1, |Πi|+ 1] (7.18)

where |Di| denotes the number of elements of Πi that dominate over xk. This meansthat mutual domination of individuals from Pi is of no importance.

7.5 Examples

7.5.1 Two objective fitness functions of a single variable

Let us consider first a case of minimisation of simple test function. This function iscomposed of two single variable functions

f(x) := (f1(x), f2(x)). (7.19)

These functions are given by definitions

f1(x) := x2 + 2, (7.20a)

f2(x) := (x+ 1)2 (7.20b)

where x ∈ [−4, 4] =: Ω, see figure 7.1.

−2 −1 0 10

2

4

6

x

f 1,f

2

−2 −1 0 1

0

2

4

6

Figure 7.1: Two objective functions of a single variable


7.5.1.1 Analytical solution

It is possible to find analytical solution by means of weighted-sum method. Thisreduces the multi-objective problem to single objective. According to equation (7.11)we have f(x) = w1f1(x) +w2f2(x). The weights wi are not independent. They mustfulfil equation (7.10). If we denote first weight as λ then the second may be expressedas 1 − λ by means of (7.10). Thanks to the definitions (7.20) the above equationmay be rewritten as f(x) = λ(x2 + 2) + (1− λ)(x+ 1)2. The necessary condition foroptimality f ′(x) = 0 gives x = λ − 1 which is the parametric representation of thePareto set Π, see figure 7.1 (dots)

Π := (x) : x(λ) = λ− 1;λ ∈ [0, 1]. (7.21)

The independent variable x (i.e. the Pareto set) is limited to x ∈ [−1, 0] = Π. Thisis because equation (7.21) must fulfil the condition (7.10).

The representation of the Pareto set Π = [−1, 0] through transformations f1 andf2 take forms

f1([−1, 0]) = [2, 3], (7.22a)

f2([−1, 0]) = [0, 1]. (7.22b)

Now we can obtain a parametric description of the Pareto set front on a plane f1f2

by means of equations (7.21) and (7.20), see figure 7.2 (solid line)

P := (f1, f2) : f1(λ) := λ2 − 2λ+ 3, f2(λ) := λ2, λ ∈ [0, 1]. (7.23)

7.5.1.2 Single objective reconstruction of Pareto set

Choosing weights from the interval [0, 1] and performing single objective optimisationfor selected weights it is possible to reconstruct the Pareto set. For the 11 differentvalues of weights listed in table 7.1 one may easily reconstruct the Pareto set. This setis shown in figure 7.2 (dots). Very good agreement between analytical and numericalsolutions can be observed.

7.5.1.3 Multi-objective SPEA

Parameters of binary representation genetic algorithms are listed in table 7.2. Figure7.3 shows all the solutions from the whole multi-objective optimisation process (left)and the Pareto set (right). Comparing figures 7.2 and 7.3 we can observe again goodagreement between numerical and analytical solutions.

7.5.2 Two objective fitness functions of two variables

Let us consider now a case of minimisation of another test function. This function iscomposed of two two variable functions f(x, y) := (f1(x, y), f2(x, y)). These are givenby definitions

f1(x, y) := (x− 2)2 + (y − 2)2, (7.24a)

f2(x, y) := x2 + (y + 2)2 (7.24b)

where x := (x, y) ∈ [−4, 4]2 =: Ω, see figure 7.4.

7.5. Examples 105

Table 7.1: Pareto set reconstruction

λ f(x) x f1(x) f2(x)

0 4.17× 10−11 −1 3 10−10

0.1 0.29 −0.9 2.81 0.010.2 0.56 −0.8 2.64 0.040.3 0.81 −0.7 2.49 0.090.4 1.04 −0.6 2.36 0.160.5 1.25 −0.4992 2.2492 0.25070.6 1.44 −0.4 2.16 0.35990.7 1.61 −0.3 2.09 0.48990.8 1.76 −0.2 2.04 0.640.9 1.89 −0.0999 2.0099 0.81011 2 −1.25× 10−8 2 1

2 2.2 2.4 2.6 2.8 30

0.2

0.4

0.6

0.8

1

f1

f 2

2 2.2 2.4 2.6 2.8 3

0

0.2

0.4

0.6

0.8

1

Figure 7.2: Reconstructed Pareto set of two objective functions of a single variable

Table 7.2: Basic parameters for Pareto set reconstruction for two functions of a singlevariable

Value

Chromosome length 20Bits per variable 20Tournament size 2Population size 20Crossover probability 0.9Mutation probability 0.05Number of generations 30


0 5 10 15 20 250

5

10

15

20

25

f1

f 20 5 10 15 20 25

0

5

10

15

20

25

2 2.2 2.4 2.6 2.8 30

0.2

0.4

0.6

0.8

1

f1

f 2

2 2.2 2.4 2.6 2.8 3

0

0.2

0.4

0.6

0.8

1

Figure 7.3: Reconstructed Pareto front of two objective functions of a single variable

−4 −2 0 2 4−4

−2

0

2

4

x

y

−4 −2 0 2 4

−4

−2

0

2

4

Figure 7.4: Two objective functions of two variables

7.5.2.1 Analytical solution

Following the same logic as for optimisation of a single variable function describedpreviously, we have

f(x, y) = λ((x− 2)2 + (y − 2)2

)+ (1− λ)

(x2 + (y + 2)2

). (7.25)

The necessary condition for optimality ∇f = 0 gives x = 2λ and y = 4λ − 2, whichis the parametric representation of the Pareto set Π, see figure 7.4 (dots)

Π := (x, y) : x(λ) = 2λ, y(λ) = 4λ− 2;λ ∈ [0, 1]. (7.26)

7.5. Examples 107

Due to the condition λ ∈ [0, 1] we obtain the following sets for independent variablesx and y

x ∈ [0, 2], (7.27a)

y ∈ [−2, 2]. (7.27b)

The representation of these sets through transformations f1 and f2 take exactly thesame form

f1 ([0, 2]× [−2, 2]) = f2 ([0, 2]× [−2, 2]) = [0, 20]. (7.28)

The parametric description of the Pareto set front P on a plane f1f2 may be obtainby means of equations (7.24) and (7.26), see figure 7.5 (solid line)

P := (f1, f2) : f1(λ) = 20(λ− 1)2, f2(λ) = 20λ2, λ ∈ [0, 1]. (7.29)

0 5 10 15 200

5

10

15

20

f1

f 2

0 5 10 15 20

0

5

10

15

20

Figure 7.5: Reconstructed Pareto set of two objective functions of two variables

7.5.2.2 Single objective reconstruction of Pareto set

Choosing weights from the interval [0, 1] and performing single objective optimisationfor selected weights one can reconstruct the Pareto set. For the 11 different values ofweights listed in table 7.3 one may reconstruct the Pareto set which is shown in figure7.5 (dots). As previously very good agreement between analytical and numericalsolutions may be noticed.

7.5.2.3 Multi-objective SPEA

Parameters of binary representation genetic algorithms are listed in table 7.4. Figure7.6 shows all the solutions from the whole multi-objective optimisation process (left)and the Pareto set (right). Comparing figures 7.5 and 7.6 we can observe again goodagreement between numerical and analytical solutions. In a case where we deal withmore than one independent variable the chromosome length is larger as well. Thisforces the population size to be larger, respectively.


0 10 20 30 40 500

10

20

30

40

50

f1

f 20 10 20 30 40 50

0

10

20

30

40

50

0 5 10 15 200

5

10

15

20

f1

f 2

0 5 10 15 20

0

5

10

15

20

Figure 7.6: Reconstructed Pareto front of two objective functions of two variables

Table 7.3: Pareto set reconstruction

λ f(x, y) x y f1(x, y) f2(x, y)

0 8.98× 10−7 0.0002 −1.9991 19.9917 9.0591× 10−7

0.1 1.8 0.2002 −1.6 16.1989 0.20010.2 3.2 0.4 −1.1994 12.7962 0.80090.3 4.2 0.6 −0.8 9.8003 1.79980.4 4.8 0.7999 −0.4 7.2001 3.19990.5 5 0.9999 1.79× 10−6 4.9999 50.6 4.8 1.2 0.3995 3.2014 7.19780.7 4.2 1.4 0.7999 1.7999 9.80010.8 3.2 1.6 1.2 0.8 12.80.9 1.8 1.7996 1.6 0.2001 16.19871 7.47× 10−9 2 1.9999 9.7× 10−9 19.9994

Table 7.4: Basic parameters for Pareto set reconstruction for two functions of twovariable

Value

Chromosome length 30Bits per variable 15Tournament size 2Population size 30Crossover probability 0.9Mutation probability 0.1Number of generations 50

7.6. Multi-objective description of Murray’s law 109

7.6 Multi-objective description of Murray’s law

7.6.1 Introduction

The two Murray’s laws describe the pattern of large to small or conversely small tolarge artery bifurcation. This is due to the optimal configuration of arteries thatallows for fastest transport with minimal work involved. Murray’s laws are valid forthe tree structure of arteries, see figure 7.7.

One of Murray’s laws gives a formula for the radii, whereas the second law speci-fies the angles of bifurcation. Both laws were formulated for arteries but they are notlimited to this venue. Some technical applications also exist. Single criterion min-imisation methods were used for the derivation. A generalisation of Murray’s law formulti-objective formulation is proposed here. It is shown that the original formulationof the optimal condition is a particular case of multi-objective formulation.

Figure 7.7: Tree structure

The original Murray’s reasoning takes into consideration two energy terms thatcontribute to maintaining the blood flow. These are the energy necessary for over-coming the viscous drag (dissipation energy) and the metabolic power necessary formaintaining the volume of blood within an artery. For steady state flows it is moreconvenient to use dissipation power N instead of energy E. These quantities are ex-plicitly related as E =

r t0N dt = N t. The dissipation power is given by Nd = V∆p.

By means of Hagen-Poiseuille’s law and the definition of constant A := 8µLπ−1 it ispossible to express this power as Nd = AV 2R−4. The metabolic power is expressed asNm = mV where m is a metabolic coefficient and volume V is given by V = πR2L.Introducing another constant B := πLm we can rewrite the metabolic power Nm inthe following form Nm = BR2.

The equation for Nd suggests that dissipation power Nd is related inversely, andmetabolic power Nm directly to radius R. It suggest that there exists an intermediateradius which minimises the total power N = Nd + Nm. This total power may beexpressed as N = AV 2R−4 + BR2. For a given V the total power N is a functionof R. The stationary point can be found from the condition N ′(R) = 0 which givesR = V 1/3C−1/3 or

V = CR3, (7.30)

where constant C is combined of A and B as C := 2−1/2A−1/2B1/2. The sign ofthe second derivative is N ′′(V 1/3C−1/3) = 12B ≥ 0. This is because A and B are


positive. The solution (7.30) represents a constant relation between volumetric flowrate and the radius in every cross-section of an artery. This is also a condition forminimal energy requirement.

The mass conservation equation gives us information that the flow rate before anybifurcation equals the sum of individual flow rates after that bifurcation V0 =

∑i Vi =

CR30. This is true for incompressible and steady state flows. Equation (7.30) allows

us to write∑i Vi = C

∑iR

3i . This is true because before and after bifurcation we

deal with the same fluid which means that we have the same constant C. The abovetwo equations give us Murray’s law which states that the cube of the radius of theparent artery equals the sum of the cubes of the radii of the daughter arteries. Thisis written as

R30 =

∑i

R3i . (7.31)

In the case of bifurcation this simplifies to R30 = R3

1 +R32. It is the most widespread

form of Murray’s law and it is known that a large part of the branching of themammalian circulatory and respiratory systems obeys it.

Assuming that Murray’s law is valid it is possible to evaluate the metabolic co-efficient m. Using the definitions of A, B and C one can show that this coefficientmay be written as m = 16π−2µC2. Since equation (7.30) is valid for every branch itallows us to determine the value of constant C. Eventually, the metabolic coefficientis given by means of the following equation

m =16µV 2

π2R6. (7.32)

7.6.2 Multi-objective description

Since the Murray reasoning takes into consideration two powers (objective functions)it is then a natural multi-objective optimisation problem. A whole set of optimalsolutions known as the Pareto set is obtained as a solution of such a problem. Theproblem considered by Murray (sum of two powers) is just one particular scalarisationmethod.

A simultaneous optimisation of two powers (objective functions) in the form N :=(Nd, Nm) results in non-dominated set of solutions (Pareto set). It is possible toobtain an analytical solution describing the Pareto front P. Taking advantage ofthe weighted-sum method we can obtain a parametric representation of the solutionwhere λ is a parameter. The scalarised form of the objective function is obtained asN := w ·N and takes the shape of N := λNd + (1− λ)Nm. The necessary conditionfor optimality N ′(R) = 0 gives us a formula which is analogous to (7.30)

V =(λ−1 − 1

)CR3. (7.33)

According to Murray the dissipation power Nd, and the metabolic power Nm, havethe identical contribution in the total power N . This corresponds to the situationwhere the weight in equation (7.33) equals λ = 1

2 . In the multi-objective descriptionit means that both powers are equally important. However, it follows from equation(7.33) that it does not have to be so. If we incorporate equation (7.33) into mass

7.6. Multi-objective description of Murray’s law 111

conservation equation V0 =∑i Vi we obtain the well known form of Murray’s law

(7.31). This means that one of the powers can have a larger share than the other. A‘share’ means weights λ and 1− λ for any λ ∈]0; 1[. The above reasoning generalisesMurray’s law.

Radius R may be found from equation (7.33) and incorporated into equationsNd = AV 2R−4 i Nm = BR2. Rearranging and introducing the dimensionless powersN+d , N+

m we have

N+d :=

Nd

(AB2V 2)13

= 2−23

(λ−1 − 1

) 43 , (7.34a)

N+m :=

Nm

(AB2V 2)13

= 213

(λ−1 − 1

)− 23 . (7.34b)

The Pareto front P takes the following form

P =

(N+d , N

+m) : N+

d := 2−23

(λ−1 − 1

) 43 , N+

m := 213

(λ−1 − 1

)− 23

. (7.35)

The Pareto front is shown in figure 7.8. The point with weight λ = 12 is marked.

0 1 2 3 4 50

1

2

3

4

5

N+d

N+ m

0 1 2 3 4 5

0

1

2

3

4

5

(0, 0)

λ = 0.457

λ = 0.5

Figure 7.8: The Pareto front

Instead of choosing a solution that treats both powers as equally important (λ = 12 )

it is also possible to apply the target vector method. The target vector is chosen asan imaginary optimum. A natural choice for this optimum is such a vector for whichboth powers equal zero. This is really an imaginary vector because the dissipationpower for viscous flows does not equal zero. However, it is assumed the target vectorg = (0, 0). The vector is localised in the centre of coordinates system in figure 7.8.The scalarisation N to N is done by means of norm N := ‖N−g‖2 because W := δ.The norm ‖·‖2 is just Euclidian norm N = (N2

d + N2m)1/2. It is assumed that the

target vector g = 0 simplifies also calculations. The necessary condition for optimalityN ′(R) = 0 gives similar equation to (7.33)

V = 214CR3. (7.36)


The weight of this solution equals λ ≈ 0.457 and is shown in figure 7.8. What isinteresting, the point (N+

d , N+m) for this weight is not placed closer to the target

vector (0, 0). This is because the Pareto front converges ‘faster’ to its own verticalasymptote rather than to the horizontal one. It is well visible in figure 7.8. Again,using the formula (7.36) and the mass conservation equation, it is possible to showthat Murray’s law takes the standard form (7.31). What is more, one can observethat the solution (7.36) obtained by means of the target-vector method is anotherparticular case of that obtained by means of weighted sum method (7.33). The sameconcerns the original Murray’s solution for which λ = 1

2 .

Chapter 8

Statistical analysis

8.1 Distributions

The initial population is crucial in order for many multi-point algorithms to performproperly. Typically, uniform random distribution of points xi within the search spacewith random seed based on time is considered, namely

xi := L + (U− L) U(0, 1). (8.1)

The above distribution was used exclusively as an initialisation for all algorithms inthis chapter. Example realisation of random sequence (8.1) is shown in figure 8.1 for1000 points. What is more, uniform distributions should be considered when thereare no information of the optimum location.

Halton sequence [11] generates deterministic distribution of points xi that lookrandom and more uniform in comparison with uniform random distributions accordingto equation (8.1), see figure 8.2. Halton sequence, in fact, is referred to as a quasi-random sequence. Listing 8.1 presents the pseudocode of Halton sequence.

Input: i, pOutput: h

1 h := 0;2 f := 1;3 j := i;4 while j > 0 do

5 f := fp ;

6 h := h+ f (j mod p);

7 j := b jpc;Algorithm 8.1: Halton sequence pesudocode

If there are some information about the possible location of the optimum, otherdistributions than uniform may provide better convergence of a considered algorithm.

114 8. Statistical analysis

For instance normal distribution with a scale parameter α is an obvious choice

xi := 12 (L + U) + a (U− L) N (0, 1) (8.2)

Example realisation of uniform random sequence is shown in figure 8.3 (α = 15 ) and

figure 8.4 (α = 110 ) for 1000 points. The lower the parameter α the more concentrated

normal distribution around 12 (L + U).

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 8.1: Uniform distribution

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 8.2: Halton sequence

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 8.3: Normal distribution α = 15

0 π4

π2

3π4

π0

π4

π2

3π4

π

x

y

0π4

π2

3π4 π

0

π4

π2

3π4

π

Figure 8.4: Normal distribution α = 110

8.2. Discrepancy 115

8.2 Discrepancy

Discrepancy is a measure of irregularity and it is helpful to inspect similarity orregularity of the population. Discrepancy for a population P of n points

P = x1, . . . ,xn (8.3)

in an m-dimensional unit cube [0, 1[m is defined

D∗ = sup D(J, P ) : J ∈ T ∗ (8.4)

where the local discrepancy, D is

D(J,F) =

∣∣∣∣ 1

N|xi ∈ P : xi ∈ J| −VolJ

∣∣∣∣ . (8.5)

VolJ is a measure of subinterval J of the form

J :=

m∏i=1

[0, ji[ (8.6)

and T ∗ is the family of all (discrete) subintervals of unit cube [0, 1[m. We have

0 < D∗ ≤ 1. (8.7)

The value close to 0 represents a ‘random’ population and is typical of first genera-tions. ‘Regular’ populations possess values close to 1.

8.3 Single-problem statistical analysis

The test suite [15] includes 28 functions fi : RD → R that are typically used asbenchmarks where x = (x1, . . . , xD). They appear in Special Session & Competitionon Real-Parameter Single Objective Optimization at CEC-2013. All test functionsare shifted and scalable and the same search ranges are defined for all test functions,namely [−100, 100]D which means that Lk = −100, Uk = 100.

By means of 28 functions the calculations are executed 30 times for each algorithmand the average of error of the best individuals of the population is computed. For asolution x the error measure is defined as

Error := fi(x)− fi(x0), (8.8)

forx0 = min

xfi(x) (8.9)

being the optimum of the particular function fi. All algorithms stop when the numberof generations nmax is reached. The total number of generations nmax is related to themaximal function evaluations number. Different cases in terms of D and the maximalevaluations number are considered. The total number of individuals is N = 20 for all


algorithms. The individual values of algorithms parameters are the same and shownin listings in appendix A.

Statistical analysis over the test suit follows the method given in [8, 9]. Theproblem is referred to as a single-problem analysis and considers a comparison ofseveral algorithms over a single function. The method comprise the use of boththe parametric and non-parametric statistical tests. Because of the fact that therequired conditions for using parametric test such as paired t-test are not fulfilledthese tests are not considered here. Non-parametric tests such as Wilcoxon test areutilised instead. The required conditions in order to use parametric tests [8, 32] are:independence, normality and heteroscedasticity. As for independence it is fulfilledbecause we deal will independent runs of algorithms starting with randomly generatedpopulations. Normality is never fulfilled because the results do not follow a normaldistribution. This is also confirmed by means of Kolmogorov-Smirnov, Shapiro-Wilkand D’Agostino-Pearson tests. The last condition, i.e., heteroscedasticity, is relatedto the hypothesis of equality of variances and is also not fulfilled according to Levenetest.

For instance, table 8.2 presents rankings based on means analogous to the Fried-man ranks that correspond directly to the positions of algorithms P . It is evidentthat best solutions is obtained here by PSO. Comparing p-values from non-parametricWilcoxon test in Table 8.2 it is obvious that PSO outperforms remaining algorithmsbecause the all the p-values are below 0.05. The same is confirmed by means of boxand whisker plot 8.7. However, table 8.4 shows that the best performing algorithmsis DE/B as it outperforms all algorithms except DE/R.

8.3. Single-problem statistical analysis 117

Test function 1/28. Sphere function

f1(z) =

D∑i=1

z2i (8.10)

wherez = x− o. (8.11)

The shifted global optimum o is randomly distributed in [−80, 80]D.Properties:• unimodal• separable

Figure 8.5: f1 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0.00

0.05

0.10

0.15

0.20

Error

Figure 8.6: Test function 1/28, D = 2,2000 evaluations

Table 8.1: Position and Wilcoxon testp-value as a function of an algorithm.

Algorithm P p-value

GA 4 1.825 × 10−6

DE/R 1 −DE/B 1 −PSO 3 5.921 × 10−6

APSO1 9 1.825 × 10−6

APSO2 10 1.825 × 10−6

FA 5 1.825 × 10−6

CS 6 1.825 × 10−6

BA 8 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 2 0.006GSA 7 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

Error

Figure 8.7: Test function 1/28, D =10, 104 evaluations


Algorithm P p-value

GA 4 1.825 × 10−6

DE/R 3 1.825 × 10−6

DE/B 2 1.276 × 10−4

PSO 1 −APSO1 9 1.825 × 10−6

APSO2 10 1.825 × 10−6

FA 5 1.825 × 10−6

CS 6 1.825 × 10−6

BA 8 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 1.825 × 10−6

GSA 7 1.825 × 10−6


Test function 2/28. Rotated high conditioned elliptic function

f2(z) =

D∑i=1

106i−6D−1 z2

i (8.12)

wherez = Tosz (M1 · (x− o)) . (8.13)

Orthogonal matrices M1, M2, . . . , M10 are generated from standard normally dis-tributed entries by Gram-Schmidt orthonormalisation. Tosz for xi is defined as

Tosz(xi) := exi+0.049(sin c1xi+sin c2xi) sgnxi (8.14)

where

xi =

ln|xi| if |xi| 6= 0,

0 otherwise,(8.15)

c1 =

10 if xi > 0,

5.5 otherwise,(8.16)

c2 =

7.9 if xi > 0,

3.1 otherwise.(8.17)

Properties:• unimodal• non-separable• quadratic ill-conditioned• smooth local irregularities

Figure 8.8: f2 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

2000

4000

6000

8000

Error

Figure 8.9: Test function 2/28, D = 2,2000 evaluations


Algorithm P p-value

GA 7 1.825 × 10−6

DE/R 1 −DE/B 2 3.256 × 10−5

PSO 10 1.825 × 10−6

APSO1 8 1.825 × 10−6

APSO2 9 1.825 × 10−6

FA 5 1.825 × 10−6

CS 4 1.825 × 10−6

BA 6 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 1.825 × 10−6

GSA 3 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

0.5 1

1.5·10

7

Error



Algorithm P p-value

GA 10 1.825 × 10−6

DE/R 2 0.144DE/B 1 −PSO 3 6.050 × 10−5

APSO1 6 1.825 × 10−6

APSO2 7 1.825 × 10−6

FA 8 3.694 × 10−6

CS 5 2.021 × 10−6

BA 4 2.475 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 1.825 × 10−6

GSA 9 1.825 × 10−6


Test function 3/28. Rotated bent cigar function.

f3(z) = z21 + 106

D∑i=2

z2i (8.18)

wherez = M2 · T 0.5

asy (M1 · (x− o)) (8.19)

and

T βasy(xi) :=

x

1+β√xi

i−1D−1

i if xi > 0,

xi otherwise.(8.20)

Properties:• unimodal• non-separable• smooth but narrow ridge

Figure 8.11: f3 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

2000

4000

6000

8000

Error



Algorithm P p-value

GA 10 1.825 × 10−6

DE/R 2 2.974 × 10−5

DE/B 1 −PSO 8 1.825 × 10−6

APSO1 5 1.825 × 10−6

APSO2 6 1.825 × 10−6

FA 9 1.825 × 10−6

CS 3 1.825 × 10−6

BA 7 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 2.695 × 10−6

GSA 4 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

0.5 1

1.5 2·10

9

Error



Algorithm P p-value

GA 3 0.209DE/R 1 −DE/B 5 1.102 × 10−4

PSO 7 3.694 × 10−6

APSO1 10 1.825 × 10−6

APSO2 9 1.825 × 10−6

FA 2 0.043CS 6 1.825 × 10−6

BA 8 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 4 2.945 × 10−4


Test function 4/28. Rotated discus function

f4(z) = 106z21 +

D∑i=2

z2i (8.21)

wherez = Tosz (M1 · (x− o)) . (8.22)

Properties:• unimodal• non-separable• asymmetrical• smooth local irregularities with one sensitive direction

Figure 8.14: f4 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

2000

4000

6000

8000

Error



Algorithm P p-value

GA 8 1.825 × 10−6

DE/R 2 1.825 × 10−5

DE/B 1 −PSO 10 1.825 × 10−6

APSO1 11 1.825 × 10−6

APSO2 12 1.825 × 10−6

FA 6 1.825 × 10−6

CS 4 1.825 × 10−6

BA 9 1.825 × 10−6

FPA/R 7 1.825 × 10−6

FPA/B 5 1.825 × 10−6

GSA 3 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 2 4 6·104

Error



Algorithm P p-value

GA 8 2.065 × 10−5

DE/R 2 0.446DE/B 1 −PSO 11 1.825 × 10−6

APSO1 3 0.012APSO2 6 1.539 × 10−4

FA 9 1.717 × 10−5

CS 10 4.9675 × 10−6

BA 4 0.003FPA/R 7 6.653 × 10−6

FPA/B 12 2.475 × 10−6

GSA 5 4.967 × 10−6


Test function 5/28. Different powers function

f5(z) =

√√√√ D∑i=1

|zi|2+ 4i−4D−1 (8.23)

wherez = x− o. (8.24)

Properties:• unimodal• separable

Figure 8.17: f5 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0.00

0.20

0.40

Error



Algorithm P p-value

GA 6 1.825 × 10−6

DE/R 2 1.825 × 10−5

DE/B 1 −PSO 3 1.825 × 10−6

APSO1 9 1.825 × 10−6

APSO2 10 1.825 × 10−6

FA 5 1.825 × 10−6

CS 7 1.825 × 10−6

BA 8 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 4 2.700 × 10−6

GSA 12 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

Error



Algorithm P p-value

GA 4 1.825 × 10−6

DE/R 2 1.825 × 10−5

DE/B 3 1.825 × 10−6

PSO 1 −APSO1 9 1.825 × 10−6

APSO2 10 1.825 × 10−6

FA 7 1.825 × 10−6

CS 6 1.825 × 10−6

BA 8 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 1.825 × 10−6

GSA 5 1.825 × 10−6


Test function 6/28. Rotated Rosenbrock’s function

f6(z) =

D−1∑i=1

(100(z2

i − zi+1)2 + (zi − 1)2)

(8.25)

where

z = M1 ·(

2.048

100(x− o)

)+ 1. (8.26)

Properties:• multi-modal• non-separable• very narrow valley from local to global optimum

Figure 8.20: f6 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 2 4 6·10 −

2

Error



Algorithm P p-value

GA 12 1.825 × 10−6

DE/R 3 0.014DE/B 1 −PSO 7 1.825 × 10−6

APSO1 5 1.825 × 10−6

APSO2 6 1.825 × 10−6

FA 11 1.825 × 10−6

CS 8 1.825 × 10−6

BA 4 1.825 × 10−6

FPA/R 9 1.825 × 10−6

FPA/B 2 0.097GSA 10 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 20 40

Error



Algorithm P p-value

GA 10 1.825 × 10−6

DE/R 2 2.945 × 10−4

DE/B 6 8.881 × 10−6

PSO 1 −APSO1 9 1.825 × 10−6

APSO2 8 1.825 × 10−6

FA 4 1.825 × 10−6

CS 3 1.717 × 10−5

BA 7 3.026 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 1.825 × 10−6

GSA 5 4.502 × 10−6


Test function 7/28. Rotated Schaffers F7 function

f7(z) =

(1

D − 1

D−1∑i=1

(√zi +

√zi sin2

(50z0.2

i

)))2

, (8.27a)

zi =√y2i + y2

i+1 (8.27b)

wherey = M2 ·Λ10 · T 0.5

asy (M1 · (x− o)) . (8.28)

The D-dimensional diagonal matrix Λα is given by the ith diagonal element

Λαii = αi−1

2D−2 . (8.29)

Properties:• multi-modal• non-separable• asymmetrical• large number of local optima

Figure 8.23: f7 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 1 2 3 4

Error



Algorithm P p-value

GA 10 1.825 × 10−6

DE/R 3 1.825 × 10−6

DE/B 1 −PSO 2 2.717 × 10−5

APSO1 7 1.825 × 10−6

APSO2 8 1.825 × 10−6

FA 4 1.825 × 10−6

CS 9 1.825 × 10−6

BA 6 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 5 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

100

200

300

Error



Algorithm P p-value

GA 1 −DE/R 3 3.894 × 10−5

DE/B 6 1.825 × 10−6

PSO 8 2.717 × 10−5

APSO1 10 1.825 × 10−6

APSO2 5 1.825 × 10−6

FA 2 0.294CS 7 1.825 × 10−6

BA 9 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 4 6.393 × 10−4


Test function 8/28. Rotated Ackley’s function

f8(z) = −20 exp

−0.2

√√√√ D∑i=1

z2i

D

− exp

D∑i=1

cos 2πziD

+ 20 + e (8.30)

wherez = M2 ·Λ10 · T 0.5

asy (M1 · (x− o)) . (8.31)

Properties:• multi-modal• non-separable• asymmetrical

Figure 8.26: f8 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 5 10

15

20

Error



Algorithm P p-value

GA 3 1.013 × 10−4

DE/R 10 1.167 × 10−4

DE/B 7 0.636PSO 2 0.046APSO1 8 1.825 × 10−6

APSO2 5 2.717 × 10−5

FA 1 −CS 6 3.256 × 10−5

BA 9 1.074 × 10−5

FPA/R 12 2.021 × 10−6

FPA/B 11 0.007GSA 4 2.975 × 10−5

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

20

20.5

21

Error



Algorithm P p-value

GA 11 0.026DE/R 10 0.084DE/B 12 4.650 × 10−5

PSO 6 0.217APSO1 1 −APSO2 4 0.365FA 2 0.523CS 7 0.422BA 5 0.354FPA/R 8 0.303FPA/B 3 0.510GSA 9 0.029


Test function 9/28. Rotated Weierstrass function

f9(z) =

D∑i=1

20∑k=0

0.5k cos(2π3k (zi + 0.5)

)−D

20∑k=0

0.5k cos(π3k

)(8.32)

where

z = M2 ·Λ10 · T 0.5asy

(M1 ·

(0.5

100(x− o)

)). (8.33)


Figure 8.29: f9 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0.0

0.5

1.0

Error



Algorithm P p-value

GA 10 1.825 × 10−6

DE/R 4 1.825 × 10−6

DE/B 1 −PSO 2 1.825 × 10−6

APSO1 7 1.825 × 10−6

APSO2 9 1.825 × 10−6

FA 3 1.825 × 10−6

CS 8 1.825 × 10−6

BA 6 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 2.702 × 10−6

GSA 5 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 5 10 15

Error



Algorithm P p-value

GA 2 0.665DE/R 11 1.825 × 10−6

DE/B 8 4.079 × 10−6

PSO 5 7.329 × 10−6

APSO1 6 2.237 × 10−6

APSO2 7 2.021 × 10−6

FA 1 −CS 4 3.026 × 10−6

BA 9 1.825 × 10−6

FPA/R 10 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 3 0.006


Test function 10/28. Rotated Griewank’s function

f10(z) = 1 +

D∑i=1

z2i

4000−

D∏i=1

coszi√i

(8.34)

wherez = Λ100 ·M1 · (6 (x− o)) . (8.35)

Properties:• multi-modal• non-separable• rotated

Figure 8.32: f10 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0.0

0.5

1.0

Error



Algorithm P p-value

GA 12 2.475 × 10−6

DE/R 2 0.346DE/B 1 −PSO 3 0.510APSO1 8 1.825 × 10−6

APSO2 10 2.021 × 10−6

FA 4 0.037CS 7 4.967 × 10−6

BA 6 8.881 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 5 1.304 × 10−4

GSA 9 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 10 20 30

Error



Algorithm P p-value

GA 7 2.237 × 10−6

DE/R 2 0.022DE/B 8 1.181 × 10−5

PSO 3 0.003APSO1 9 1.825 × 10−6

APSO2 10 1.825 × 10−6

FA 1 −CS 5 1.825 × 10−6

BA 6 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 1.825 × 10−6

GSA 4 1.825 × 10−6


Test function 11/28. Rastrigin’s function

f11(z) =

D∑i=1

(10 + z2

i − 10 cos 2πzi)

(8.36)

where

z = Λ10 · T 0.2asy

(Tosz

(5.12

100(x− o)

)). (8.37)

Properties:• multi-modal• separable• asymmetrical• large number of local optima

Figure 8.35: f11 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0.0

0.5

1.0

1.5

2.0

Error



Algorithm P p-value

GA 3 6.894 × 10−4

DE/R 1 −DE/B 9 4.601 × 10−4

PSO 2 0.411APSO1 6 1.199 × 10−4

APSO2 5 7.197 × 10−5

FA 7 7.197 × 10−5

CS 8 2.066 × 10−5

BA 10 8.070 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.168 × 10−5

GSA 4 3.188 × 10−4

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

Error



Algorithm P p-value

GA 3 9.306 × 10−5

DE/R 5 6.393 × 10−4

DE/B 6 1.883 × 10−5

PSO 4 1.199 × 10−4

APSO1 8 1.825 × 10−6

APSO2 7 1.825 × 10−6

FA 1 −CS 9 1.825 × 10−6

BA 10 2.021 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.168 × 10−5

GSA 2 0.376


Test function 12/28. Rotated Rastrigin’s function

f12(z) =

D∑i=1

(10 + z2

i − 10 cos 2πzi)

(8.38)

where

z = M1 ·Λ10 ·M2 · T 0.2asy

(Tosz

(M1 ·

(5.12

100(x− o)

))). (8.39)

Properties:• multi-modal• non-separable• asymmetrical• large number of local optima

Figure 8.38: f12 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 1 2 3 4

Error



Algorithm P p-value

GA 11 1.425 × 10−5

DE/R 2 0.376DE/B 8 0.104PSO 6 0.651APSO1 1 −APSO2 4 0.007FA 3 0.837CS 7 0.007BA 9 0.018FPA/R 10 2.475 × 10−6

FPA/B 12 3.344 × 10−6

GSA 5 0.061

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

Error



Algorithm P p-value

GA 3 0.257DE/R 6 2.021 × 10−6

DE/B 5 1.825 × 10−6

PSO 4 2.727 × 10−6

APSO1 8 1.825 × 10−6

APSO2 7 1.825 × 10−6

FA 2 0.510CS 9 1.825 × 10−6

BA 10 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 1 −


Test function 13/28. Non-continues rotated Rastrigins function

f13(z) =

D∑i=1

(10 + z2

i − 10 cos 2πzi)

(8.40)

wherez = M1 ·Λ10 ·M2 · T 0.2

asy (Tosz (y)) , (8.41)

yi =

wi if |wi| ≤ 0.5,12b2wie if |wi| > 0.5,

(8.42)

w = M1 ·(

5.12

100(x− o)

). (8.43)

Properties• multi-modal• rotated• non-separable• asymmetrical• large number of local optima

Figure 8.41: f13 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 1 2 3 4

Error



Algorithm P p-value

GA 9 0.037DE/R 4 0.015DE/B 10 5.545 × 10−5

PSO 6 0.376APSO1 1 −APSO2 2 0.108FA 3 0.334CS 5 0.004BA 7 0.067FPA/R 11 2.475 × 10−6

FPA/B 12 2.717 × 10−5

GSA 8 7.329 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

Error



Algorithm P p-value

GA 3 1.167 × 10−4

DE/R 4 2.021 × 10−6

DE/B 8 2.021 × 10−6

PSO 5 2.021 × 10−6

APSO1 6 1.825 × 10−6

APSO2 7 1.825 × 10−6

FA 1 −CS 10 1.825 × 10−6

BA 9 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 2 0.013


Test function 14/28. Schwefel’s function

f14(z) = 418.9829D −D∑i=1

g(zi) (8.44)

wherez = Λ10 · (10 (x− o)) + 420.9687, (8.45)

g(zi) =

zi sin

√|zi|, if |zi| ≤ 500,

(500− zi mod 500) sin√

500− zi mod 500− (zi−500)2

104d if zi > 500,

(zi mod 500− 500) sin√

500− |zi| mod 500− (zi+500)2

104d if zi < 500.

(8.46)Properties:• multi-modal• non-separable• asymmetrical• large number of local optima

Figure 8.44: f14 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

Error



Algorithm P p-value

GA 4 0.012DE/R 1 −DE/B 8 0.010PSO 2 0.491APSO1 10 8.070 × 10−6

APSO2 7 1.013 × 10−4

FA 5 0.051CS 6 4.256 × 10−5

BA 11 2.481 × 10−5

FPA/R 9 1.825 × 10−6

FPA/B 12 7.329 × 10−6

GSA 3 0.008

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

1000

2000

Error



Algorithm P p-value

GA 1 −DE/R 7 8.069 × 10−6

DE/B 5 2.717 × 10−5

PSO 2 0.029APSO1 10 1.825 × 10−6

APSO2 11 1.825 × 10−6

FA 3 0.024CS 6 1.825 × 10−6

BA 9 2.021 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 8 1.825 × 10−6

GSA 4 0.020


Test function 15/28. Rotated Schwefel’s function

f14(z) = 418.9829D −D∑i=1

g(zi) (8.47)

wherez = Λ10 ·M1 · (10 (x− o)) + 420.9687, (8.48)

g(zi) =

zi sin

√|zi|, if |zi| ≤ 500,

(500− zi mod 500) sin√

500− zi mod 500− (zi−500)2

104d if zi > 500,

(zi mod 500− 500) sin√

500− |zi| mod 500− (zi+500)2

104d if zi < 500.

(8.49)Properties:• multi-modal• rotated• non-separable• asymmetrical• large number of local optima

Figure 8.47: f15 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

Error



Algorithm P p-value

GA 7 1.102 × 10−4

DE/R 2 0.036DE/B 6 0.005PSO 10 0.002APSO1 8 3.561 × 10−5

APSO2 4 5.927 × 10−4

FA 1 −CS 3 0.001BA 12 1.825 × 10−6

FPA/R 9 1.717 × 10−5

FPA/B 11 0.001GSA 5 0.009

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

1000

2000

Error



Algorithm P p-value

GA 3 0.003DE/R 12 1.825 × 10−6

DE/B 11 1.825 × 10−6

PSO 4 1.825 × 10−6

APSO1 7 1.825 × 10−6

APSO2 9 1.825 × 10−6

FA 2 0.043CS 5 2.021 × 10−6

BA 6 1.825 × 10−6

FPA/R 10 1.825 × 10−6

FPA/B 8 1.825 × 10−6

GSA 1 −


Test function 16/28. Rotated Katsuura function

f16(z) =10

D2

D∏i=1

1 +

32∑j=1

2−j |2jzi − b2jzie|

10D−12

− 10

D2(8.50)

where

z = M2 ·Λ100 ·M1 ·(

5

100(x− o)

). (8.51)

Properties:• multi-modal• continuous• non-separable• asymmetrical• non-differentiable

Figure 8.50: f16 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 1 2 3

Error



Algorithm P p-value

GA 8 5.493 × 10−4

DE/R 9 0.003DE/B 12 2.975 × 10−5

PSO 2 0.334APSO1 5 6.394 × 10−4

APSO2 4 0.018FA 3 0.294CS 10 0.003BA 7 0.002FPA/R 11 3.451 × 10−4

FPA/B 1 −GSA 6 0.002

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 1 2 3

Error



Algorithm P p-value

GA 2 7.429 × 10−4

DE/R 11 1.825 × 10−6

DE/B 12 1.825 × 10−6

PSO 3 2.475 × 10−6

APSO1 6 1.825 × 10−6

APSO2 10 1.825 × 10−6

FA 1 −CS 4 1.825 × 10−6

BA 9 1.825 × 10−6

FPA/R 8 1.825 × 10−6

FPA/B 5 1.825 × 10−6

GSA 7 1.825 × 10−6


Test function 17/28. Lunacek bi-Rastrigin function

f17(z) = min

D∑i=1

(xi − µ0)2, D + s

D∑i=1

(xi − µ1)2

+ 10D − 10

D∑i=1

cos 2πzi (8.52)

wherez = Λ100 · (x− µ0) , (8.53)

xi = µ0 + yi2 sgn oi, (8.54)

y =1

10(x− o) , (8.55)

µ0 = 2.5, (8.56)

µ1 = −√µ2

0

s, (8.57)

s = 1− 1

2√D + 20− 8.2

. (8.58)

Figure 8.53: f17 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 2 4 6

Error



Algorithm P p-value

GA 8 0.007DE/R 3 0.365DE/B 5 0.156PSO 4 0.258APSO1 6 0.100APSO2 2 0.869FA 9 0.007CS 1 −BA 7 0.046FPA/R 11 0.006FPA/B 12 6.893 × 10−4

GSA 10 9.279 × 10−4

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

Error



Algorithm P p-value

GA 2 2.717 × 10−5

DE/R 4 9.770 × 10−6

DE/B 6 2.021 × 10−4

PSO 3 8.620 × 10−4

APSO1 8 1.825 × 10−6

APSO2 9 1.825 × 10−6

FA 1 −CS 10 1.825 × 10−6

BA 7 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 5 1.825 × 10−6


Test function 18/28. Rotated Lunacek bi-Rastrigin function

f18(z) = min

D∑i=1

(xi − µ0)2, D + s

D∑i=1

(xi − µ1)2

+ 10D − 10

D∑i=1

cos 2πzi (8.59)

wherez = M2 ·Λ100 ·M1 · (x− µ0) , (8.60)

xi = µ0 + yi2 sgn oi, (8.61)

y =1

10(x− o) , (8.62)

µ0 = 2.5, (8.63)

µ1 = −√µ2

0

s, (8.64)

s = 1− 1

2√D + 20− 8.2

. (8.65)

Properties:• multi-modal• continuous• non-separable• asymmetrical• non-differentiable

Figure 8.56: f18 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 2 4 6

Error



Algorithm P p-value

GA 10 2.237 × 10−6

DE/R 6 0.026DE/B 9 0.002PSO 1 −APSO1 3 0.128APSO2 4 0.019FA 7 0.241CS 5 0.181BA 2 0.046FPA/R 11 3.694 × 10−6

FPA/B 12 1.181 × 10−5

GSA 8 7.429 × 10−4

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

Error



Algorithm P p-value

GA 2 0.001DE/R 5 1.825 × 10−6

DE/B 10 1.825 × 10−6

PSO 3 3.732 × 10−4

APSO1 6 1.825 × 10−6

APSO2 8 1.825 × 10−6

FA 1 −CS 9 1.825 × 10−6

BA 7 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 4 4.079 × 10−6


Test function 19/28. Rotated expanded Griewank’s plus Rosenbrock’s function

f19(z) = f10 (f6(zD, z1)) +

D−1∑i=1

f10 (f6(zi, zi+1)) (8.66)

where

z = M1 ·(

5

100(x− o)

)+ 1. (8.67)

Properties:• multi-modal• non-separable

Figure 8.59: f19 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0.00

0.05

0.10

0.15

0.20

Error



Algorithm P p-value

GA 11 1.825 × 10−6

DE/R 1 −DE/B 4 0.013PSO 5 1.077 × 10−4

APSO1 3 2.066 × 10−5

APSO2 7 4.967 × 10−6

FA 8 2.737 × 10−6

CS 2 1.417 × 10−4

BA 6 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 10 8.767 × 10−6

GSA 9 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 5 10

Error



Algorithm P p-value

GA 3 2.136 × 10−4

DE/R 5 3.026 × 10−6

DE/B 6 1.564 × 10−5

PSO 2 0.046APSO1 9 1.825 × 10−6

APSO2 10 1.825 × 10−6

FA 1 −CS 8 1.825 × 10−6

BA 7 1.825 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 4 1.825 × 10−6


Test function 20/28. Rotated expanded Scaffer’s F6 function

f20(z) = g(zD, z1) +

D−1∑i=1

g(zi, zi+1), (8.68a)

g(x, y) =1

2+

sin2√x2 + y2 − 1

2

(1 + 10−3 (x2 + y2))2 (8.68b)

wherez = M2 · T 0.5

asy (M1 · (x− o)) . (8.69)


Figure 8.62: f20 : R2 → R


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0.00

0.05

0.10

0.15

0.20

Error



Algorithm P p-value

GA 8 0.025DE/R 1 −DE/B 6 0.198PSO 3 0.531APSO1 4 0.069APSO2 5 8.620 × 10−4

FA 11 1.884 × 10−5

CS 7 8.881 × 10−6

BA 2 0.036FPA/R 10 1.825 × 10−6

FPA/B 9 0.008GSA 12 1.825 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 2 4 6

Error



Algorithm P p-value

GA 2 0.805DE/R 12 3.344 × 10−6

DE/B 6 0.002PSO 1 −APSO1 9 0.001APSO2 3 0.365FA 4 0.376CS 5 0.010BA 7 0.001FPA/R 11 3.026 × 10−6

FPA/B 10 2.729 × 10−4

GSA 8 0.001


Test function 21/28. Composition function

f21(z) =∑

i∈1,2,3,4,5ωi (λifi(z) + bi) (8.70)

whereωi =

wi∑i∈1,2,3,4,5

wi, (8.71)

wi =exp

√D∑j=1

(xj−oij)2

−2Dσ2i√

D∑j=1

(xj − oij)2

. (8.72)


Figure 8.65: f21 : R2 → R

Table 8.41: f21 coefficients

σi λi bi

10 100 020 10−6 10030 10−26 20040 10−6 30050 10−1 400


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

200

Error



Algorithm P p-value

GA 11 1.199 × 10−4

DE/R 4 0.002DE/B 6 0.651PSO 2 3.732 × 10−4

APSO1 9 0.004APSO2 3 0.593FA 7 8.620 × 10−4

CS 1 −BA 8 0.061FPA/R 5 4.079 × 10−6

FPA/B 12 1.199 × 10−4

GSA 10 0.001

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

200

400

Error



Algorithm P p-value

GA 2 1.425 × 10−5

DE/R 9 2.475 × 10−6

DE/B 3 3.344 × 10−6

PSO 6 2.475 × 10−6

APSO1 8 2.737 × 10−6

APSO2 5 6.038 × 10−6

FA 7 2.475 × 10−6

CS 1 −BA 4 3.344 × 10−6

FPA/R 12 2.475 × 10−6

FPA/B 11 2.475 × 10−6

GSA 10 2.475 × 10−6



f22(z) =

3∑i=1

ωi (λif14(z) + bi) (8.73)

whereωi =

wi3∑i=1

wi

. (8.74)

Properties:• multi-modal• separable• asymmetrical

Figure 8.68: f22 : R2 → R


σi λi bi

20 1 020 1 10020 1 200


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

200

Error



Algorithm P p-value

GA 2 0.579DE/R 1 −DE/B 7 0.026PSO 3 0.593APSO1 10 0.001APSO2 8 2.361 × 10−4

FA 5 0.017CS 6 5.927 × 10−4

BA 9 9.307 × 10−5

FPA/R 11 2.021 × 10−6

FPA/B 12 2.066 × 10−5

GSA 4 0.181

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

1000

2000

3000

Error



Algorithm P p-value

GA 3 0.133DE/R 4 0.002DE/B 6 1.883 × 10−4

PSO 2 0.410APSO1 8 1.825 × 10−6

APSO2 9 1.825 × 10−6

FA 1 −CS 7 2.021 × 10−6

BA 10 1.825 × 10−6

FPA/R 12 1.825 × 10−6

FPA/B 11 2.237 × 10−6

GSA 5 3.188 × 10−4



f23(z) =

3∑i=1

ωi (λif15(z) + bi) (8.75)

whereωi =

wi3∑i=1

wi

. (8.76)


Figure 8.71: f23 : R2 → R


σi λi bi

20 1 020 1 10020 1 200


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

100

200

300

Error



Algorithm P p-value

GA 8 3.256 × 10−5

DE/R 2 0.387DE/B 9 8.005 × 10−4

PSO 4 0.294APSO1 7 7.430 × 10−4

APSO2 5 0.007FA 1 −CS 6 0.001BA 11 8.546 × 10−5

FPA/R 10 1.181 × 10−5

FPA/B 12 4.079 × 10−6

GSA 3 0.007

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

1000

2000

3000

Error



Algorithm P p-value

GA 1 −DE/R 12 1.825 × 10−6

DE/B 11 2.021 × 10−6

PSO 4 6.038 × 10−6

APSO1 8 1.825 × 10−6

APSO2 7 1.825 × 10−6

FA 2 0.789CS 5 1.825 × 10−6

BA 6 1.825 × 10−6

FPA/R 10 1.825 × 10−6

FPA/B 9 1.825 × 10−6

GSA 3 0.001



f24(z) =∑

i∈9,12,15ωi (λifi(z) + bi) (8.77)

whereωi =

wi∑i∈9,12,15

wi. (8.78)


Figure 8.74: f24 : R2 → R


σi λi bi

20 0.25 020 1 10020 2.5 200


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

Error



Algorithm P p-value

GA 10 2.975 × 10−5

DE/R 1 −DE/B 6 0.056PSO 2 0.666APSO1 7 0.034APSO2 4 0.002FA 9 2.975 × 10−5

CS 3 6.051 × 10−5

BA 11 4.255 × 10−5

FPA/R 8 2.737 × 10−6

FPA/B 12 1.304 × 10−4

GSA 5 0.006

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

100

150

200

250

Error



Algorithm P p-value

GA 4 2.237 × 10−6

DE/R 11 1.825 × 10−6

DE/B 6 1.825 × 10−6

PSO 7 2.021 × 10−6

APSO1 9 1.825 × 10−6

APSO2 5 3.026 × 10−6

FA 2 1.564 × 10−5

CS 1 −BA 10 1.825 × 10−6

FPA/R 8 2.737 × 10−6

FPA/B 12 2.475 × 10−6

GSA 3 3.255 × 10−5



f25(z) =∑

i∈9,12,15ω′i (λifi(z) + bi) (8.79)

whereωi =

wi∑i∈9,12,15

wi. (8.80)


Figure 8.77: f25 : R2 → R


σi λi bi

10 0.25 030 1 10050 2.5 200


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

Error



Algorithm P p-value

GA 10 5.554 × 10−5

DE/R 5 0.122DE/B 9 0.001PSO 3 0.837APSO1 8 0.010APSO2 1 −FA 11 2.481 × 10−5

CS 1 −BA 6 0.022FPA/R 4 0.009FPA/B 12 2.483 × 10−5

GSA 7 0.001

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

150

200

250

Error



Algorithm P p-value

GA 2 0.471DE/R 10 1.825 × 10−6

DE/B 7 2.021 × 10−6

PSO 6 1.825 × 10−6

APSO1 9 1.825 × 10−6

APSO2 4 4.035 × 10−4

FA 1 −CS 3 0.010BA 8 2.021 × 10−6

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 5 0.004



f26(z) =∑

i∈2,9,10,12,15ωi (λifi(z) + bi) (8.81)

whereωi =

wi∑i∈2,9,10,12,15

wi. (8.82)


Figure 8.80: f26 : R2 → R


σi λi bi

10 0.25 010 1 10010 10−7 20010 2.5 30010 10 400


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

Error



Algorithm P p-value

GA 9 5.927 × 10−4

DE/R 3 0.016DE/B 10 0.188PSO 4 0.019APSO1 6 0.885APSO2 1 −FA 11 2.737 × 10−6

CS 2 0.003BA 8 0.006FPA/R 7 4.967 × 10−6

FPA/B 12 1.297 × 10−5

GSA 5 0.150

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

100

200

300

400

Error



Algorithm P p-value

GA 2 0.422DE/R 7 1.825 × 10−6

DE/B 5 6.393 × 10−4

PSO 6 0.001APSO1 10 3.255 × 10−5

APSO2 8 1.297 × 10−5

FA 3 0.188CS 1 −BA 9 3.255 × 10−5

FPA/R 4 1.825 × 10−6

FPA/B 11 2.475 × 10−6

GSA 12 2.475 × 10−6



f27(z) =∑

i∈1,9,10,12,15ωi (λifi(z) + bi) (8.83)

whereωi =

wi∑i∈1,9,10,12,15

wi. (8.84)


Figure 8.83: f27 : R2 → R


σi λi bi

10 100 010 10 10010 2.5 20020 25 30020 0.1 400


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

100

200

300

Error



Algorithm P p-value

GA 5 0.025DE/R 1 −DE/B 7 0.012PSO 3 0.419APSO1 9 0.004APSO2 4 0.225FA 11 1.672 × 10−4

CS 2 0.621BA 8 0.004FPA/R 6 0.004FPA/B 10 2.136 × 10−4

GSA 12 8.070 × 10−6

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

400

600

800

1000

Error



Algorithm P p-value

GA 1 −DE/R 10 2.737 × 10−6

DE/B 5 7.429 × 10−4

PSO 7 5.493 × 10−4

APSO1 6 4.035 × 10−4

APSO2 9 7.196 × 10−5

FA 2 0.161CS 4 0.058BA 8 2.945 × 10−4

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 3 8.076



f28(z) =∑

i∈1,7,15,19,20ωi (λifi(z) + bi) (8.85)

whereωi =

wi∑i∈1,7,15,19,20

wi. (8.86)


Figure 8.86: f28 : R2 → R


σi λi bi

10 2.5 020 2.5 × 10−3 10030 2.5 20040 5 × 10−4 30050 0.1 400


GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0 50

100

150

200

Error



Algorithm P p-value

GA 8 0.001DE/R 4 0.209DE/B 9 1.425 × 10−5

PSO 2 0.680APSO1 7 0.007APSO2 3 0.537FA 6 0.006CS 1 −BA 10 9.307 × 10−5

FPA/R 5 2.974 × 10−5

FPA/B 12 4.967 × 10−6

GSA 11 1.110 × 10−4

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

500

1000

Error



Algorithm P p-value

GA 2 0.024DE/R 4 0.008DE/B 7 2.481 × 10−4

PSO 6 1.303 × 10−5

APSO1 10 2.475 × 10−6

APSO2 8 2.737 × 10−6

FA 3 0.014CS 1 −BA 9 6.050 × 10−5

FPA/R 11 1.825 × 10−6

FPA/B 12 1.825 × 10−6

GSA 5 5.544 × 10−5

8.4. Multiple-problem statistical analysis 173

8.4 Multiple-problem statistical analysis

Statistical analysis of the algorithms over optimisation problems follows the methodgiven in [8, 9]. The method comprise the use of the non-parametric statistical tests.The problem is referred to as a multiple-problem analysis and considers a comparisonof several algorithms over more than one problem (function) simultaneously. Aspreviously, the test suite [15] includes all 28 functions fi : RD → R.

In general, a parametric statistical tests might obtain similar conclusions to anon-parametric test but the former can lead to incorrect conclusions. This is becauseof dissimilarities in the results and small size of the analysed sample data. Whatis more, the non-parametric tests do not require explicit conditions while requiredconditions for parametric tests are typically not satisfied. For instance, the results ofFriedman test for D = 2 and 100 evaluations in terms of is p-value= 2.044 × 10−26.This make it possible to see if there are global differences in the results. Indeed, thep-value shows significant differences among algorithms for the value lower than thelevel of significance α = 0.05. The differences may be revealed by means of a post-hocstatistical analysis.

Table 8.65 presents rankings coming from the Friedman test and correspondingpositions of the algorithms. It is evident that best solutions are reached by theAPSO1 algorithm. However, the differences among APSO1, DE/B, PSO, GSA areinsignificant for D = 2 and 100 evaluations. This fact is also displayed in figure8.89 which presents ranking from table 8.65 (the lower the better). The horizontallines for α = 0.05 and 0.1 represent the threshold for best performing algorithms.The threshold height equal the lowest rank increased by the corresponding criticaldifference CDα calculated by the Bonferroni-Dunn’s method

CDα = q

√k(k + 1)

6h. (8.87)

In the above k = 12 stands for the number of algorithms and h = 28 number of testfunctions. The critical value q for a multiple non-parametric comparison is taken fromstatistical tables [32]. If bars exceed these lines this simply means that the associatedalgorithms perform significantly worse in comparison with the algorithm associatedwith the lowest bar [8]. According to figure 8.89 APSO1 algorithm outperformssignificantly GA, DE/R, APSO2, FA, CS, BA, FPA/R, FPA/B (D = 2 and 100evaluations). APSO1, however cannot outperform DE/B, PSO and GSA. Similarconclusions can be drawn from the Wilcoxon signed-rank test. The Wilcoxon testp-values are presented in table 8.66. It conducts individual comparisons betweentwo algorithms rather than multiple comparisons. APSO1 outperforms GA, DE/R,APSO2, FA, CS, BA, FPA/R, FPA/B. However, APSO1 cannot outperform DE/B,PSO, GSA.


8.4.1 D = 2, 100 evaluations

As mentioned earlier, Friedman’s rank test p-value is 2.607 × 10−41 for multiple-problem analysis. According to figure 8.89 and table 8.65 APSO1 algorithm outper-forms significantly GA, DE/R, APSO2, FA, CS, BA, FPA/R, FPA/B but it cannotoutperform DE/B, PSO and GSA. According to table 8.66 APSO1 outperforms GA,DE/R, APSO2, FA, CS, BA, FPA/R, FPA/B. However, APSO1 cannot outperformDE/B, PSO, GSA.

Table 8.65: Friedman’s ranks R andpositions P as a function of an algo-rithm

Algorithm R P

GA 8.964 10DE/R 7.500 8DE/B 5.036 5PSO 3.500 2APSO1 2.857 1APSO2 4.643 4FA 6.143 7CS 8.321 9BA 6.036 6FPA/R 10.786 12FPA/B 10.000 11GSA 4.214 3

Table 8.66: Wilcoxon test p-values asa function of an algorithm

Algorithm p-value

GA 0.000013DE/R 0.000180DE/B 0.223121PSO 0.356402APSO1 −APSO2 0.000783FA 0.002551CS 0.000024BA 0.029655FPA/R 4.003 × 10−6

FPA/B 0.000022GSA 0.624427

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

2

4

6

8

10

12

Algorithm

Position

α = 0.05

α = 0.1

Figure 8.89: D = 2, 100 evaluations



Friedman’s rank test p-value is 2.044×10−26 for multiple-problem analysis. Accordingto figure 8.90 and table 8.67 PSO algorithm outperforms significantly GA, DE/R, FA,CS, BA, FPA/R, FPA/B but it cannot outperform DE/B, APSO1, APSO2 and GSA.According to table 8.68 PSO outperforms GA, DE/R, FA, CS, BA, FPA/R, FPA/B.However, PSO cannot outperform DE/B, APSO1, APSO2 and GSA.


Algorithm R P



Algorithm p-value

GA 0.006967DE/R 0.001000DE/B 0.148181PSO −APSO1 0.175450APSO2 0.182818FA 0.011116CS 0.000663BA 0.001272FPA/R 4.003 × 10−6

FPA/B 0.009125GSA 0.405885

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

2

4

6

8

10

12

Algorithm

Position

α = 0.05

α = 0.1




According to figure 8.91 and table 8.69 DE/R algorithm outperforms significantly GA,DE/B, APSO1, APSO2, FA, BA, FPA/R, FPA/B and GSA but it cannot outperformPSO, and CS. According to table 8.70 DE/R outperforms GA, DE/B, APSO1, FA,BA, FPA/R, FPA/B and GSA. However, DE/R cannot outperform PSO, APSO2 andCS.


Algorithm R P



Algorithm p-value

GA 0.000032DE/R −DE/B 0.000737PSO 0.073848APSO1 0.000610APSO2 0.054331FA 0.002749CS 0.051540BA 0.000064FPA/R 0.000024FPA/B 0.000012GSA 0.000039

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

2

4

6

8

10

12

Algorithm

Position

α = 0.05

α = 0.1




According to figure 8.91 and table 8.71 FA algorithm outperforms significantly DE/R,DE/B, APSO1, APSO2, CS, BA, FPA/R, and FPA/B but it cannot outperform GA,PSO and GSA. According to table 8.72 FA outperforms GA, DE/R, DE/B, PSO,APSO1, APSO2, CS, BA, FPA/R, FPA/B and GSA. However, FA cannot outperformGA.


Algorithm R P



Algorithm p-value

GA 0.300156DE/R 0.019592DE/B 0.0039693PSO 0.0036918APSO1 0.0007207APSO2 0.0010002FA −CS 0.0314053BA 0.0009222FPA/R 0.0000530FPA/B 4.003 × 10−6

GSA 0.0013772

GA

DE/R

DE/B

PSO

APSO1

APSO2

FA

CS

BA

FPA/R

FPA/B

GSA

0

2

4

6

8

10

12

Algorithm

Position

α = 0.05

α = 0.1


Bibliography

[1] Back, T., Fogel, D. B, Michalewicz, Z., (Eds.) 2000. “Evolutionary computation2 – advanced algorithms and operators” Bristol and Philadelphia: Institute ofPhysics Publishing Ltd.

[2] Bronshtein, I. N., Semendyayev, K.A., Musiol, G., Muhlig, H. 2007. “Handbookof mathematics” Berlin, Heidelberg: Springer-Verlag

[3] Eiben, A. E., Smith, J. E. 2003. “Introduction to evolutionary computing” Berlin:Springer-Verlag

[4] Eiben, A. E., Smit, S. K. 2011. “Parameter tuning for configuring and analyzingevolutionary algorithms” Swarm and Evolutionary Computation 1 (1): 19–31

[5] Elsgolc, L.D. 2007. “Calculus of variations” New York: Dover Publications, Inc.

[6] Fister, I., Fister, I. Jr., Yang, X. S., Brest, J. 2013. “A comprehensive review offirefly algorithms” Swarm and Evolutionary Computation 13 (1): 34–46

[7] Fister, I. Jr., Yang, X. S., Fister, I., Brest, J., Fister, D. 2013. “A brief review ofnature-inspired algorithms for optimization” Elektrotehniski Vestnik 80 (3): 1–7

[8] Garcıa, S., Fernandez, A., Benıtez, A.D., Herrera, F. 2007. “Statistical Compar-isons by Means of Non-Parametric Tests: A Case Study on Genetic Based Ma-chine Learning” Proceedings of the II Congreso Espanol de Informatica (CEDI2007)

[9] Garcıa, S., Molina, D., Lozano, M., Herrera, F. 2009. “A study on the use ofnon-parametric tests for analyzing the evolutionary algorithms’ behaviour: acase study on the CEC’2005 Special Session on Real Parameter Optimization”Journal of Heuristics 15: 617–644

[10] Goldberg, D. E. 1989. “Genetic algorithms in search, optimization and machinelearning” Boston, MA: Addison-Wesley

[11] Halton, J.H. 1960. “On the efficiency of certain quasi-random sequences of pointsin evaluating multi-dimensional integrals” Numerische Mathematik 2: 84–90

[12] Holland, J.H. 1962. “Outline for a logical theory of adaptive systems” Journal ofthe Association for Computing Machinery 3: 297–314

Bibliography 179

[13] Kennedy, J., Eberhart, R. 1995. “Particle swarm optimization” Proceedings ofIEEE International Conference on Neural Networks IV. 1942–1948

[14] Kirkpatrick, S., Gelatt Jr, C. D., Vecchi, M. P. 1983. “Optimization by simulatedannealing” Science 220 (4598): 671–680

[15] Liang, J. J., Qu, B. Y., Suganthan, P. N., Hernandez-Dıaz, A. G. 2013. “Prob-lem definitions and evaluation criteria for the CEC 2013 special session on real-parameter optimization” (Technical Report 201212), China: Zhengzhou Univer-sity, and Singapore: Nanyang Technological University

[16] Mantegna, R. N. 1994. “Fast, accurate algorithm for numerical simulation ofLevy stable stochastic processes” Physical Review E 49 (5): 4677–4683

[17] Michalewicz, Z. 1996. “Genetic algorithms + data structures = evolution pro-grams” 3rd ed. Berlin, Heidelberg, New York: Springer

[18] Price, K. V., Storn, R., and Lampinen, J. 2005 “Differential evolution: A prac-tical approach to global optimization” Berlin: Springer-Verlag

[19] Rashedi, E., Nezamabadi-pour, H., Saryazdi, S. 2009. “GSA: A gravitationalsearch algorithm” Information Sciences 179: 2232–2248

[20] Storn, R., Price, K. 1997. “Differential evolution-A simple and efficient heuristicfor global optimization over continuous spaces” Journal of Global Optimization,11: 341–359

[21] Tesch, K., Atherton, M.A., Karayiannis, T.G., Collins, M.W., Edwards, P. 2009.“Determining heat transfer coefficients using evolutionary alogrithms” Engineer-ing Optimization 41 (9): 855–870

[22] Tesch, K. 2010. “On some extensions of Murray’s law” TASK Quarterly 14 (3):57–65

[23] Tesch, K., Banaszek, M. 2011. “A variational method of finding streamlines inring cascades for creeping flows” TASK Quarterly 15 (1): 71–84

[24] Tesch, K., Kaczorowska, K. 2016. “Arterial cannula shape optimization by meansof the rotational firefly algorithm” Engineering Optimization 48 (3): 497–518

[25] Thiemard, E., 2001. “An algorithm to compute bounds for the star discrepancy”Journal of Complexity, 17: 850–880.

[26] Yang, X.S., 2008. “Nature-inspired metaheuristic algorithms” Frome, UK: Lu-niver Press

[27] Yang, X.S., Deb, S. 2009. “Cuckoo search via Levy flights” In: Proceeings ofworld congress on nature & biologically inspired computing (NaBIC 2009), USA:IEEE Publications, 210–214

180 Bibliography

[28] Yang, X.S. 2010. “A new metaheuristic bat-inspired algorithm” In: Cruz C,Gonzalez, J.R., Pelta, D.A., Terrazas, G., (Eds). Nature inspired cooperativestrategies for optimization (NISCO 2010). Studies in computational intelligence.Berlin, Germany: Springer, 65–74

[29] Yang, X.S. 2012. “Flower pollination algorithm for global optimization” In: Un-conventional computation and natural computation. Lecture notes in computerscience, 7445, 240–49

[30] Yang, X.S., Deb, S., Fong, S. 2011. “Accelerated particle swarm optimizationand support vector machine for business optimization and applications” In: Net-worked digital technologies. Communications in computer and information sci-ence, 136. Berlin, Germany: Springer, 53–66

[31] Yang, X. S. 2009. “Firefly algorithm, stochastic test functions and design opti-misation” International Journal of Bio-inspired Computation 2 (2): 78–84

[32] Zar, J.H. 1999. “Biostatistical analysis” New Jersey: Prentice Hall

[33] Zitzler, E., Thiele, L. 1999. “Multiobjective evolutionary algorithms: A compara-tive study and the strength pareto approach” IEEE Transaction on EvolutionaryComputation, 3 (4): 257–271

Appendix A

Codes

This appendix contains working examples of single-point and multi-point, derivativefree algorithm codes. Implementations are provided in the Mathematica programminglanguage. It has to be pointed out that the readability and comprehensibility ofimplementations is preferred to efficiency and length.

A.1 Single-point, derivative-free algorithms

d = 2; L = Table0, d; U = Tableπ, d;

f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

n = 200;

α =1

Sqrtd n;

x = Table0, Table0, d, n;x〚1, 2〛 = L + (U - L) * RandomVariateUniformDistribution[0, 1], d;x〚1, 1〛 = f[x〚1, 2〛];g = x〚1〛;Fori = 2, i ≤ n, i++,

xi, 2 = xi - 1, 2 + α (U - L) * RandomVariateNormalDistribution[0, 1], d;xi, 1 = fxi, 2;Ifxi, 1 ≤ g〚1〛, g = xi;

Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.1: Uncontrolled random walk code

182 A. Codes


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

n = 200;

α =1

Sqrtd n;


within = False;

While¬ within,

xi, 2 = xi - 1, 2 + α (U - L) * RandomVariateNormalDistribution[0, 1], d;within = IsWithinxi

;xi, 1 = fxi, 2;Ifxi, 1 ≤ g〚1〛, g = xi;

Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.2: Domain controlled random walk code

IsWithin = Functionx1,b = True;

Forj = 1, j ≤ d, j++,

b = Andb, Lj ≤ x12, j ≤ Uj;b

;

Figure A.3: Within or without?

A.1. Single-point, derivative-free algorithms 183


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

n = 200;

α =1

10 Sqrtd;


xi, 2 = g〚2〛 + α (U - L) * RandomVariateNormalDistribution[0, 1], d;xi, 1 = fxi, 2;Ifxi, 1 ≤ g〚1〛, g = xi;

Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.4: Position controlled random walk code


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

n = 200;

δ = 10-4; T = 1; α =1

10;

x = Table0, Table0, d, n;x〚1, 2〛 = L + (U - L) * RandomVariateUniformDistribution[0, 1], d;x〚1, 1〛 = f[x〚1, 2〛];g = l = x〚1〛;Fori = 2, i ≤ n, i++,

T *= δ1/n;xi, 2 = l〚2〛 + α (U - L) * RandomVariateNormalDistribution[0, 1], d;xi, 1 = fxi, 2;Δ = xi, 1 - l〚1〛;IfΔ < 0 || Exp- Δ

T > Random[], l = xi;

Ifl〚1〛 < g〚1〛, g = l;Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.5: Simulated annealing code

184 A. Codes

A.2 Multi-point, derivative-free algorithms


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20; α = 1; G0 = 1;

v = a = TableTable0, d, Nn;M = Table[0, Nn];

x = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;DoForj = 1, j ≤ Nn, j++, xj, 1 = fxj, 2;b, w = Sort[x, #1〚1〛 < #2〚1〛 &]〚1, Nn〛;Ifk ⩵ 1, g = b;Ifb〚1〛 < g〚1〛, g = b;

M =xAll, 1 - w〚1〛

b〚1〛 - w〚1〛;

M =M

Total[M];

G = G0 ⅇ-α k/n;

e = TableTable0, d, Nn;Fori = 1, i ≤ Nn, i++,

Forj = 1, j ≤ Nn, j++,

Ifi ≠ j, ei +=Mj RandomReal1, d * xj, 2 - xi, 2

Normxi, 2 - xj, 21 + 2 × 10-16

;;v = RandomReal1, Nn, d * v + e G;

xAll, 2 += v;

, k, 1, nPrint"fb(", g〚2〛, ")=", g〚1〛

Figure A.6: Gravitational search algorithm code

A.2. Multi-point, derivative-free algorithms 185


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20; pC = 0.7; pM = 0.15; sT = 3;

x = y = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2;g = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;

DoForj = 1, j ≤ Nn, j += 2,

p1 = x〚Tournament[sT]〛;p2 = x〚Tournament[sT]〛;

c1, c2 = Crossover[p1, p2];

yj = Mutationc1, i;yj + 1 = Mutationc2, i;

;

Forj = 1, j ≤ Nn, j++, yj, 1 = fyj, 2;l = Sort[y, #1〚1〛 < #2〚1〛 &]〚1〛;Ifl〚1〛 < g〚1〛, g = l;

x = SurvSelection[x, y];

, i, 2, nPrint"fb(", g〚2〛, ")=", g〚1〛

Figure A.7: Genetic algorithm code

Tournament = FunctionTour,k = RandomInteger[1, Nn];

Forl = 1, l < Tour, l++,

m = RandomInteger[1, Nn];

Ifx〚m, 1〛 < xk, 1, k = m;;k

;

Figure A.8: GA parent selection (tournament) code

186 A. Codes

Crossover = Functionx1, x2,

y1 = x1;

y2 = x2;

IfRandomReal[] < pC,

a = RandomReal[];

y1〚2〛 = a x1〚2〛 + (1 - a) x2〚2〛;y2〚2〛 = a x2〚2〛 + (1 - a) x1〚2〛;

;y1, y2

;

Figure A.9: GA crossover code

Mutation = Functionx1, i, n,y1 = x1;

Fork = 1, k ≤ d, k++,

IfRandomReal[] < pM,

y12, k = Lk + RandomReal[] Uk - Lk;

;y1

;

Figure A.10: GA uniform mutation code

Mutation = Functionx1, i,y1 = x1;

Fork = 1, k ≤ d, k++,

IfRandomReal[] < pM,

Δ = 1 - RandomReal[](1-i/n)2;

IfRandomInteger[] ⩵ 0,

y12, k += Uk - y12, k Δ,y12, k -= y12, k - Lk Δ

;;

;y1

;

Figure A.11: GA nonuniform mutation code


SurvSelection = Functionx, y,

x1 = Join[x, y];

x1 = Sort[x1, #1〚1〛 < #2〚1〛 &];

x1 = x1〚1 ;; Nn〛;x1

;

Figure A.12: GA (µ+ λ) strategy code

SurvSelection = Function[x, y,

y

];

Figure A.13: GA (µ, λ) strategy code


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20;

CR = 0.9; F = 0.7;

x = y = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++,

xi, 1 = fxi, 2;g = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;

Fori = 2, i ≤ n, i++,

Forj = 1, j ≤ Nn, j++,

R = RandomSampleComplementRange[Nn], j;K = TableHeavisideThetaCR - RandomReal[], d;KRandomInteger1, d = 1;

yj, 2 = K * (x〚R〚3〛, 2〛 + F (x〚R〚1〛, 2〛 - x〚R〚2〛, 2〛)) + (1 - K) * xj, 2;;Forj = 1, j ≤ Nn, j++,

yj, 1 = fyj, 2;Ifyj, 1 ≤ xj, 1, xj = yj;Ifyj, 1 ≤ g〚1〛, g = yj;

;Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.14: Differential evolution code

188 A. Codes


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20;

p = 0.8; λ = 1.5; α = 0.01;

σu =Gamma[1 + λ] Sin π λ

2

Gamma 1+λ2

λ 2(λ-1)/2

1/λ

;

x = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2g = y = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;Fori = 2, i ≤ n, i++,

Forj = 1, j ≤ Nn, j++,

Ifp < RandomReal[],

u = σu RandomVariateNormalDistribution[0, 1], d;v = RandomVariateNormalDistribution[0, 1], d;y〚2〛 = xj, 2 + α u

Abs[v]1/λ* g〚2〛 - xj, 2,

R = RandomSample[Range[Nn]];

ϵ = RandomVariateUniformDistribution[0, 1];y〚2〛 = xj, 2 + ϵ (x〚R〚1〛, 2〛 - x〚R〚2〛, 2〛)

;y = CheckRange[y];

y〚1〛 = f[y〚2〛];Ify〚1〛 < xj, 1,xj, 2 = y〚2〛;If[y〚1〛 < g〚1〛, g = y];

;;

Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.15: Flower pollination algorithm code

CheckRange = Functionx1,

y1 = x1;

Forl = 1, l ≤ d, l++,

Ify12, l < Ll, y12, l = Ll;Ify12, l > Ul, y12, l = Ul;

;y1

;

Figure A.16: Check range code



f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20;

α = 1; β = 1; θ = 0.3; θ0 = 0.7; δ = 10-4;

v = TableTable0, j, 1, d, Nn;x = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2l = x;

g = Sortl, #1〚1〛 < #2〚1〛 &〚1〛;Fori = 2, i ≤ n, i++,

θ *= δ1/n;Forj = 1, j ≤ Nn, j++,

vj = (θ0 + θ) vj + β RandomReal1, d * g〚2〛 - xj, 2 +

α RandomReal1, d * lj, 2 - xj, 2;xj, 2 += vj;xj, 1 = fxj, 2;Ifxj, 1 < lj, 1,lj = xj;Iflj, 1 < g〚1〛, g = lj;

;;

Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.17: Particle swarm optimisation code

190 A. Codes

d = 2;

L = Table0, d;U = Tableπ, d;

f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20;

α = 0.1; β = 0.3; θ = 0.5; θ0 = 0.1; δ = 10-4;

v = TableTable0, d, Nn;x = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2g = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;Fori = 2, i ≤ n, i++,

θ *= δ1/n;Forj = 1, j ≤ Nn, j++,

vj = (θ0 + θ) vj + α (U - L) * RandomReal1, d -1

2+ β g〚2〛 - xj, 2;

xj, 2 += vj;xj, 1 = fxj, 2;Ifxj, 1 < g〚1〛, g = xj;

;Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.18: Accelerated particle swarm optimisation 1 code


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20;

β = 0.2; α = 0.5; α0 = 0.1; δ = 10-4;

x = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2g = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;Fori = 2, i ≤ n, i++,

α *= δ1/n;Forj = 1, j ≤ Nn, j++,

xj, 2 += (α0 + α) (U - L) * RandomReal1, d -1

2+ β g〚2〛 - xj, 2;

xj, 1 = fxj, 2;Ifxj, 1 < g〚1〛, g = xj;

;Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.19: Accelerated particle swarm optimisation 2 code



f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20;

α = 0.5; β0 = 0.2; β = 0.8; γ = 1; δ = 10-3;

x = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;DoFori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2;x = y = Sort[x, #1〚1〛 < #2〚1〛 &];

g = x〚1〛;α *= δ1/n;Fori = 1, i ≤ Nn, i++,

Forj = 1, j < i, j++,

xi, 2 += β0 + β Exp-γ Normxj, 2 - xi, 22 yj, 2 - xi, 2 +

α (U - L) * RandomVariateUniformDistribution- 12,1

2, d;

;;, k, 1, n

Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.20: Firefly algorithm code


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 20;

A = 0.5; r = 0.5; Fl = 0; Fu = 1; α = 0.02;

v = TableTable0, d, Nn;x = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2y = g = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;Fori = 2, i ≤ n, i++,

Forj = 1, j ≤ Nn, j++,

Ifr < RandomReal[],

y〚2〛 = g〚2〛 + α (U - L) * RandomVariateNormalDistribution[0, 1], d,F = Fl + Fu - Fl RandomReal[];vj += g〚2〛 - xj, 2 F;y〚2〛 = xj, 2 + vj;

;y〚1〛 = f[y〚2〛];Ify〚1〛 ≤ xj, 1 && RandomReal[] < A, xj, 2 = y〚2〛;If[y〚1〛 < g〚1〛, g = y];

;Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.21: Bat algorithm code

192 A. Codes


f = -5 i=1

d

Sin#1i - i=1

d

Sin7 #1i &;

Nn = 20; n = 400;

α = 0.2; λ = 1.5; pa = 0.4; F = Nn;

σu =Gamma[1 + λ] Sin π λ

2

Gamma 1+λ2

λ 2(λ-1)/2

1/λ

;

x = y = Table0, L + (U - L) * RandomVariateUniformDistribution[0, 1], d, Nn;Fori = 1, i ≤ Nn, i++, xi, 1 = fxi, 2g = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;WhileF < n,

Fori = 1, i ≤ Nn, i++,

u = σu RandomVariateNormalDistribution[0, 1];v = RandomVariateNormalDistribution[0, 1];ϵ = RandomVariateNormalDistribution[0, 1], d;yi, 2 = xi, 2 + α u

Abs[v]1/λ(U - L) *

ϵNorm[ϵ]

;

F++;

yi, 1 = fyi, 2;Ifyi, 1 < xi, 1, xi = yi;

;R1 = RandomSample[Range[Nn]];

R2 = RandomSample[Range[Nn]];

Fori = 1, i ≤ Nn, i++,

ϵ = α RandomVariateUniformDistribution[0, 1], d;yi, 2 = xi, 2 + ϵ * xR1i, 2 - xR2i, 2 HeavisideThetapa - Random[];Ifxi, 2 ≠ yi, 2, F++;

yi, 1 = fyi, 2;Ifyi, 1 < xi, 1, xi = yi;

;g = Sort[x, #1〚1〛 < #2〚1〛 &]〚1〛;

Print"fb(", g〚2〛, ")=", g〚1〛

Figure A.22: Cuckoo search code

A.3. Miscellaneous 193

A.3 Miscellaneous

halton = Functioni, p,h = 0;

ff = 1;

j = i;

Whilej > 0,

ff /= p;

h += ff Modj, p;j = Floorj p;

;h

;

Figure A.23: Halton sequence code

Appendix B

AGA – Advanced GeneticAlgorithm

B.1 Brief introduction

The program AGA1) (Advanced Genetic Algorithms) can be used to solve the greatmajority of the single objective optimisation problems. It is also possible to solvemulti-objective problems providing that it can be reduced to single objective. Shortdescription of this program is presented below on the basis of two variables functionoptimisation f : R2 ⊇ [0, π]2 → R given by equation (1.17) and shown in figure 1.1.The minimum value is

min[0,π]2

f = f(π

2,π

2

)= −6. (B.1)

The easiest way to find that minimum is to write a script file in a text editor in thefollowing format:

[VARIABLES]

x = real [0; 3.1415]

y = real [0; 3.1415]

[FUNCTION]

analytical = 1

[EQUATIONS]

f = -5.*sin(x)*sin(y) - sin(5*x)*sin(5*y)

[SHOW]

arrange = 1

[OPTIONS]

maximisation = 0

loop_counter = 100

max_iteration = 100

[SELECTION]

1) http://www.pg.gda.pl/∼krzyte/ga/aga.zip

http://www.pg.gda.pl/~krzyte/ga/aga.zip

B.2. Detailed introduction 195

tournament = 1

tournament_size = 3

[CROSSOVER]

probability = 0.7

arithmetical = 1

[MUTATION]

probability = 0.15

nonuniform = 1

[POPULATION]

size = 30

constatnt = 1

[PLOTS]

Convergence = Average;CurrentMin

D = Discrepancy

Different_vs_2^Entropy = Different;2^Entropy

Next step is to save it as ‘test.txt’ and it can be run writing in the command line ‘agatest.txt’. Finally, a window will appear similar to figure B.1.

Figure B.1: Main window

AGA found the minimum called ‘GlobalMin’ (see window named ‘Statistics’)f(1.57078, 1.57081) = −6. It is very close to the exact value. If one does not want touse the command line then it is necessary to run AGA. A window like in figure B.2can be seen. Next, option ‘Open’ from menu ‘File’ should be chosen and file ‘test.txt’highlighted and ‘Open’ button pressed.

B.2 Detailed introduction

Instead of writing script file it is possible to define the optimisation problem manually.In order to do this menu ‘File’ (figure B.3) should be chosen.• ‘New...’ – defines a new problem

196 B. AGA – Advanced Genetic Algorithm

Figure B.2: Empty mainwindow

Figure B.3:Menu File

Figure B.4: I/O console

• ‘Open...’ – opens an existing problem definition from file. There are two typesof file

– *.txt – script files (text)– *.aga – AGA internal files (binary)

• ‘Close all’ – closes all the windows and finishes our optimisation• ‘Save’ – saves current problem• ‘Save as?’ – saves current problem with a different name• ‘Run’ – runs the optimisation process• ‘Terminate?’ – terminates the running process• ‘About?’ – short note about the author• ‘Exit’ – exits applicationWhen option ‘New...’ from menu ‘File’ is chosen, dialog box called ‘Initialise’

appears. It consists of five pages. First page ‘Variables’ (figure B.5) is used in orderto declare optimisation variables (in the case of function optimisation – independentvariables).

Figure B.5: Dialog box Initialise –first tab

Figure B.6: Dialog box Initialise –second tab

There are three types of variables:• ‘Boolean’ – logical variable. Two values are possible: 0 (false) and 1 (true).• ‘Integer’ – integer variable. Values from the subset of integer numbers [−231 −


2; 231 − 1] are possible. If an integer variable has been chosen it is necessary tospecify the range of the desired subset. There are two edit windows: ‘from’ and‘to’.

• ‘Real’ – floating-point variable. Values from the subset of floating-point numbers[−2.23× 10−308; 2.23× 10−308] are possible. In this case also it is necessary tospecify the range of the desired subset. There are two edit windows: ‘from’ and‘to’. After filling them the range of the subset is given by [From;To].

When the edit windows ‘from’ and ‘to’ are filled, the next step is to add variablesto the list. In order to do this the button ‘Add’ should be used. If one wants toremove a variable from the list then it should be highlighted and the button ‘Remove’pushed. In order to name a variable, it is enough to click on the empty place in thecolumn ‘Name’. If the name is not specified the program will assume default name(x1, x2, . . .). In our case we can name variables: x and y.

Second page control ‘Function’ (figure B.6) of the dialog box ‘Initialise’ determinesthe entry method for the value fitness function. There are three possibilities:• ‘Manual’ – enters individual values by hand• ‘Analytical’ – analytical equation can be provided• ‘External’ – automatically by external program (*.exe or *.bat file).

Manual entry of the value fitness function is required for all the values that are notcalculated yet during the optimisation process step. We decided to deliver valuefitness function by analytical equation. In order to do this the following equationshould be provided in the edit box ‘f(. . .) =’:

-5.*sin(x)*sin(y) - sin(5*x)*sin(5*y)

If description of the function is more complicated then one has to click the button‘I/O’ – that will open I/O console (figure B.4). For more detailed description see ‘I/OConsole’ section B.3. The function must be called ‘f’.

If the value fitness function is delivered from an external program, the name ofthe text file must be specified into which AGA will send independent variables thatare generated by GA. The external program must calculate value of the function forthese variables (‘Data for external program (variables from AGA)’). Default name ofthis file is ‘input.txt’. The name of executable (*.exe) program or *.bat file whichis going to calculate these values must also be specified (‘External program’). Theexternal program has to save the calculated value to another text file. That textfile will be received by AGA in order to continue optimisation process (‘Data fromexternal program (function value for AGA)’). Default name of this file is ‘output.txt’.This makes it possible to create a fully automated optimisation process. In our caseexample, the source code looks:

#include <f s t ream . h>#include <math . h>

using namespace std ;

int main ( int /∗ argc ∗/ , char ∗argv [ ] ) i f s t r e a m in ( argv [ 1 ] ) ;


ofstream out ( argv [ 2 ] ) ;

double x1 , x2 , f ;

in >> x1 >> x2 ;f = −5.∗ s i n ( x1 )∗ s i n ( x2 ) − s i n ( 5 .∗ x1 )∗ s i n ( 5 .∗ x2 ) ;out << f << endl ;

return 0 ;

It calculates the value of function given by equation (1.17). Source code listedabove should be compiled and the executable version should be placed into the foldertogether with AGA.

Input data are received from the file specified as a first argument (argv[1]) andresult saved into file specified as a second argument (argv[2]). The AGA program willcall the external program with command-line arguments, using the syntax:

External.exe input.txt output.txt

Third page control ‘General’ (figure B.7) allows to choose population size andwindows that will be seen during optimisation process• ‘Statistics history’ – statistics for every iteration or for the last one (current) if

it is unchecked will be seen• ‘Buffer individuals’ – fitness function values that were already calculated can

be buffer in order to save computation time. It works very well for integer andboolean variables.

– ‘Show buffered’ – shows buffered individuals with calculated fitness func-tion values (if ‘Buffer individuals’ has been chosen)

• ‘Show population’ – shows individuals of last generation. It is important formanual delivering of fitness function. Those values have to be input

– ‘Population history’ – as above (if ‘Show population’ has been chosen) butit shows all the generations

• ‘Show variables’ – displays basic information about optimisation variables• ‘(Initial) population size’ – the size of the population can be specify here. For

variable population size method it is initial population size only (first step)In our case we set ‘(Initial) population size’ to 30.Fourth page control ‘Plots’ (figure B.8) makes it possible to specify the plots we

want to see during optimisation process. One plot is predefined (for minimisation).It is plot called ‘Convergence’. The name we choose is inconsequential. If we do notname the plot then AGA will do it for us. In order to plot something we must inputthis in the window ‘Plot’ and click the button ‘Add’. If we change our mind then wecan remove it using the button ‘Remove’.

We can plot variables or even equations. To remind ourselves of the standardvariable names click the button ‘?’ and we will see the window in figure B.9.

In order to input our equation, variable names specified in the first column of thiswindow should be used. If we want to see more than one plot in the same window then


Figure B.7: Dialog box Initialise –third tab

Figure B.8: Dialog box Initialise –fourth tab

Figure B.9: I/O console Figure B.10: Dialog box Initialise– fifth tab

it is necessary to separate equations using ‘;’. In our case we can add two additionalplots. To do it we input

Discrepancy

in the edit window ‘Plot’, then we click the button ‘Add’ and finally change the plot’sname to

D

The second plot consists of two graphs that show the number of different individualsand the function 2Entropy. In order create it we input

Different;2^Entropy

and optionally change the name to


Different_vs_2^Entropy

Fifth page control ‘Export’ (figure B.10) allows us to export variables or equationsto file. The rules of exporting are the same as in page control ‘Plots’. There are twodifferences: we have to specify a single equation (we do not use ‘;’ to separate) andwe can export vector variables (instead of scalar for plotting).

In order to choose the name of export file name we input it to ‘Export file name’.Check box ‘Every iteration’ allows us to export results from each iteration, or fromthe last only if it is left unchecked.

It is also possible to modify the optimisation process options. In order to dothis we choose either option ‘Options’ from menu ‘File’ or the relevant speed button.Dialog box ‘Options’ consists of five pages. The first of them, labelled ‘General’ (figureB.11), allows us to choose maximal number of iterations ‘Max iteration’ and changethe number of generations ‘Loop counter’. The number of generations is importantonly for external and analytical method of supplying the function value. We candecide here whether we maximise or minimise our problem – ‘Maximisation’. AGAcan optionally be closed when the optimisation is finished. To do this we choose‘Close AGA when finished’. This is useful when we call AGA from other program orwant to fully automate our computation.

Figure B.11: Dialog box Options– first tab

Figure B.12: Dialog box Options– second tab

Second page ‘Options’ (figure B.12) allows us to choose selection method. Thereare two methods so far:• ‘Tournament’ – random tournament selection (default). We can specify the size

of tournaments ‘Tournament size’. Default tournament size equals 3. Due toits virtues this method is recommended.

• ‘Roulette’ – roulette wheel selection. Roulette wheel method is a classicalmethod of individual selection. It possesses more defects than virtues.

Both methods fits either maximisation or minimisation problems. This means thatroulette method is modified version of classical roulette.

Third page ‘Crossover’ (figure B.13) contains options that are necessary in orderto change crossover method. One can also change ’Crossover probability’. There are


three crossover methods:

• ‘Multipoint’ – multipoint method. ‘Points’ indicates the number of crossoverpoints. Default number is 1

• ‘Arithmetical’ – arithmetical crossover method. This method is suitable fornumerical optimisation

• ‘Uniform’ – exchanges all the genes with ‘Uniform cross probability’

Figure B.13: Dialog box Options– third tab

Figure B.14: Dialog box Options– fourth tab

Fourth page ‘Mutation’ (figure B.14) allows us to specify ‘Mutation probability’and to choose among mutation methods

• ‘Uniform’ – this method does not depend on current generation number (stageof optimisation)

• ‘Nonuniform’ – this method depends on current generation number. The laterthe generation the smaller the influence of mutation. We can also change the‘Uniformity coefficient’

Fifth page ‘Population’ (figure B.15) lets us choose population size. There are twopossibilities

• ‘Constant’ – population size does not depend on generation number – is constantduring optimisation process

• ‘Variable’ – population size varies during optimisation process

Menu ‘Window’ (figure B.16) consists of at least five options for windows manip-ulation

• ‘Cascade’• ‘Tile Horizontally’• ‘Tile Vertically’• ‘Minimize All’• ‘Arrange All’

The speed toolbar (figure B.17) consists of six buttons. They are:

• ‘New’ – same as option ‘New’ form menu ‘File’• ‘Open’ – same as option ‘Open’ from menu ‘File’• ‘Save’ – same as option ‘Save’ from menu ‘File’


Figure B.15: Dialog box Options– fifth tab

Figure B.16: Menu window

• ‘Cascade’ – same as option ‘Cascade’ from menu ‘Window’• ‘Tile Horizontally’ – same as option ‘Tile Horizontally’ from menu ‘Window’• ‘Tile Vertically’ – same as option ‘Tile Vertically’ from menu ‘Window’• ‘Run’ – same as option ‘Run’ from menu ‘File’• ‘Terminate’ – same as option ‘Terminate’ from menu ‘File’• ‘Options’ – same as options ‘Options’ from menu ‘File’

Figure B.17: Speed toolbar

If we have decided to input function value manually then we have to supply thesevalues in the window ‘Population’ (figure B.18) for each set of independent variableslisted out by the program. First we click on the desired line and then we click againon the ‘?’. Finally we can input a value.

Figure B.18: Population window Figure B.19: Chart window

Right mouse button clicked on chart window displays popup menu (figure B.19).This menu allows us to modify our chart:• ‘Save to file’ – saves the chart as bitmap to file• ‘Copy to clipboard as’ – copies the chart to clipboard as

B.3. I/O Console 203

– ‘Bitmap’– ‘Metafile’ – enhanced metafile (‘emf’ format)

• ‘Show pitch’ – shows or hides chart pitch• ‘Precision’ – sets number of decimal digits along specified axis

– ‘x’– ‘y’

• ‘Floating’ – allows us to change floating-point number to integer on specifiedaxis

– ‘x’– ‘y’

• ‘Plot range’• ‘Proportional’ – makes the chart proportional. This is useful when x-axis and

y-axis are similar or the same• ‘Background colour’ – changes the background colour of our chart

B.3 I/O Console

There is a difference between capital and small letters. When a function does notgive unique value then the main value is returned. We operate on tensors in general.Scalars are zero valence tensors. Complex numbers are represented in the form of

a + b*I

where a is real part and b imaginary part. Real parts are represented as double preci-sion 64 bits number. Space characters are ignored. If input value can be transformedinto numerical value then AGA returns numerical value. For instance

2 + pi / e

gives

3.15573

If input value cannot be transformed to numerical value then AGA will try to simplifyits part and will treat it as symbolic value. For instance

2 * pi / x

gives

6.28319/x

under condition that we have not defined x yet.Commands are characterised by square brackets [ ] and cannot be nested. Com-

mand list:• List[] – lists all declared and predefined variables. Predefined variables are

– pi = π– deg = π

180

– I =√−1


– e = limn→∞

(1 + n−1

)n– Generation – generation number– Average – average fitness function– StdDev – standard deviation– Mutations – mutations number– Crossovers – crossovers number– PopSize – population size– Different – number of different fitness function values– MeanDev – mean deviation– MedianDev – median deviation– Range = CurrentMax - CurrentMin

– Q1 – lower quartile– Q2 – median– Q3 – upper quartile– IQR = Q3 - Q2 – interquartile range– Entropy

– Discrepancy

– GlobalMax – maximal fitness function value from all generations– GlobalMin – minimal fitness function value from all generations– CurrentMax – maximal fitness function value from current generation– CurrentMin – minimal fitness function value from current generation– CrossPoints – number of crossover points– TourSize – tournament size– Buffered – number of buffered individuals– CrossUnifProb – uniform crossover probability– CrossProb – crossover probability– MutProb – mutation probability– GlobalMinIndiv – individual with minimal fitness function value from all

generations– GlobalMaxIndiv – individual with maximal fitness function value from all

generations– CurrentMinIndiv – individual with minimal fitness function value from

current generation– CurrentMaxIndiv – individual with maximal fitness function value from

current generation• Clear[] – clears all declared variables• Clear[x] – clears x variable

Functions are characterised by round brackets (). Three dots in (...) mean that itis a function with variable number of arguments. All the functions can be nested asmany times as we want.

Function list:

• vect(...) – vect function lets us input tensor of valence described by squarenumber. For instance: first valence tensor is just a vector. To obtain vectorcomposed of three coordinates a, b, c we must write

vect(a, b, c)

There is a shorter and more comfortable way of inputting tensors by means of


curly brackets. Previous example now looks like thisa, b, c

Second valence tensor – matrix 2× 3 looks like thisA, B, C, D, E, F

in traditional notation (A B CD E F

)matrix 3× 2

A, B, C, D, E, F

in tradition notation A BC DE F

• sin() – calculates sine function. Arguments should be given in radians. To

convert degrees to radians we multiply them by deg. For examplesin(45 * deg)

gives√

2/2 or numerically0.707107

It is possible to calculate sine of complex arguments or of objects different thanscalars. For instance, sine of a vector

sin(I, pi, a)

gives the vector1.1752*I,1.22515e-16,sin(a)

that is sine of all components. If argument is symbolic then results are symbolicas well

• cos(), tg(), ctg(), sinh(), cosh(), tgh(), ctgh()• arcsin() – calculates arcus sine of complex argument. Result is given in radi-

ans. To convert to degrees we divide it by deg. For instance for√

2/2 the angleequals 45

arcsin(sqrt(2) / 2) / deg

gives45

• arccos(), arctg(), arcctg(), arsinh(), arcosh(), artgh(), arctgh()• log(,) – log(a, b) calculates logarithm of b to base a. If results of this operation

is c then ac == b. For instancelog(2, 3)

gives1.58496

so2 ^ log(2, 3)

gives of course3

One can calculates logarithms of tensorslog(10, I, pi, a)

result0.682188*I,0.49715,log(10,a)


that is function log calculates logarithm of all components. One can calculatelogarithm to different base

log(2, 3, e, I, pi, a)

result2.26618*I,1.04198,log(2.71828,a)

The base and variable that we calculate logarithm of have to be of the samedimension. This is because the notation

log(10, I, pi)

is equivalent tolog(10, I), log(10, pi)

It is obvious when we compare themlog(10,I),log(10,pi) == log(10,I,pi)

we obtain 1 as a result (logic true)1

• ln() – calculates natural logarithm. For instanceln(2)

gives0.693147

Notationln(2)

is equivalent tolog(e, 2)

we compare themln(2) == log(e, 2)

and we will see 1 (true)1

• exp() – calculates exponential function of x, i.e., ex. We can check ite^2 == exp(2)

as a result we will see logic 11

• abs() – calculates absolute value. For instanceabs(I)

gives1

• sqrt() – calculates square root. For examplesqrt(pi)

gives1.77245

It is the same aspi^0.5

• re() – calculates real part of a complex number. For instancere(2+I)

gives2

• im() – calculates imaginary part of a complex number. For instanceim(2+4*I)


gives4

• arg() – calculates the argument of complex number (i.e. angle in radians)where arg(x) ∈ ]−π, π]. For instance

arg(I) / deg

gives result in degrees90

• conj() – calculates the conjugate complex number to given argument. Forinstance

conj(2+I)

gives2-I

• D(,) – calculates symbolic partial derivative of a symbolic function. For instanceddx sin(x)

D(sin(x), x)

givescos(x)

One can calculate composite derivative. If y equals x2

y = x^2

thenD(sin(y), x)

givescos(x^2) * 2*x

Logarithmic derivativeD(x^x, x)

givesx^x * (ln(x) + 1)

We can also calculate multiple derivative. For instance d2

dx2 sin(x)D(D(sin(x),x),x)

gives-sin(x)

We can also calculate derivatives of tensors• sum(,,,) – calculates symbolic sum. If it is possible to give the results as a

number then function sum will do it. For instance the sum of numbers from 1to 100 or

∑100j=1 j

sum(j,j,1,100)

equals5050

If it is not possible to return numerical result then the result will be given insymbolic form. For instance

∑3j=1 x

j .sum(x^j,j,1,3)

givesx + x^2 + x^3

One must take care of sum indexes. It cannot be a variable that has alreadybeen declared. We can calculate multiple sums by nesting one in another

• product(,,,) – calculates symbolic product


• floor() – gives the integer part of a real number• ceil() – gives the closest integer number greater than an argument• round() – gives the closest integer number of a real numberList of operators according to priority:

= assignment|| or

&& and! = unequal== equal>= greater or equal> greater

<= less or equal< less− minus+ plus/ divide: divide∗ times! not˜ neg (changing the sign)ˆ power curly brackets() round brackets

B.4 Script writing

If first character in line is ‘;’ then the line is treated as a comment and is ignored.Section [VARIABLES] defines variable types and range (if applicable). It works

the same as page ‘Variables’ in dialog box ‘Initialise’. First we have to specify thename of the variable then its type and range (if applicable). For instance

[VARIABLES]

x = real [0; 3.1415]

z = integer [0; 32]

b = boolean

Section [FUNCTION] defines function specifying method: manual, analytical orexternal. It works almost the same like page ‘Function’ in dialog box ‘Initialise’. If wedecide to supply the function by analytical equation then we have to write ‘analytical= 1’ and then we have to specify the equation(s) in section [FUNCTION]. If ourchoice is to deliver the function externally then we have to specify also the relevantfile names (see below)

[FUNCTION]

manual = 0

analytical = 0

B.4. Script writing 209

external = 1

input_name = input.txt

external_name = external.exe

output_name = output.txt

Section [EQUATIONS] is important only if we supply function value analytically.It works like ‘I/O Console’. All the lines need to have variable name, character ‘=’and finally equation. The optimised function must be called ‘f’. We can specify asmany variables as we want. For instance

[EQUATIONS]

a = 1

b = 5.1/4/pi^2

c = 5/pi

d = 6

e_ = 10

f_ = 1/8/pi

f = a*(x2-b*x1^2+c*x1-d)^2 + e_*(1-f_)*cos(x1) + e_

Section [SHOW] allows us mainly to decide which windows we want to see duringoptimisation process. The exception is ‘buffer individuals’ where we decide whetherto buffer individuals or not. Option ‘arrange’ tells AGA to auto-arrange all windows.It defines almost all the parameters form page ‘General’ in dialog box ‘Initialise’. Theonly exception is population size defined in section [POPULATION].

[SHOW]

statistic_history = 0

buffer_individuals = 0

buffered = 0

population = 0

population_history = 0

variables = 0

arrange = 1

Section [OPTIONS] defines the same parameters as page ‘General’ in dialog box‘Options’

[OPTIONS]

maximisation = 0

loop_counter = 20

max_iteration = 100

close = 0

Sections [SELECTION] defines the same parameters as page ‘Selection’ in dialogbox ‘Options’

[SELECTION]

tournament = 1

tournament_size = 3

roulette = 0

elite = 0


Section [CROSSOVER] defines the same parameters as page ‘Crossover’ in dialogbox ‘Options’

[CROSSOVER]

probability = 0.7

multipoint = 0

points = 1

arithmetical = 1

uniform = 0

uniform_prob = 0.5

Section [MUTATION] defines the same parameters as page ‘Mutation’ in dialogbox ‘Options’

[MUTATION]

probability = 0.15

uniform = 0

nonuniform = 1

uniformity_coeff = 2

Section [POPULATION] defines the same parameters as page ‘Population’ in di-alog box ‘Options’ except population size.

[POPULATION]

size = 30

constatnt = 1

variable = 0

Section [EXPORT NAMES] contains some parameters from page ‘Export’ in di-alog box ‘Initialise’.

[EXPORT_NAMES]

; export_name = export.txt

; export_every_iteration = 1

Section [EXPORT] contains variable names and definitions (equations) we wantto export. It works like page ‘Export’ in dialog box ‘Initialise’. First we have tospecify our name then character ‘=’ and finally internal name or equation.

[EXPORT]

; gen = Generation

; average = Average

; standard_deviation = StdDev

; mutation_number = Mutations

; cross_num = Crossovers

; populations_size = PopSize

; diff = Different

; name1 = MeanDev

; name2 = MedianDev

B.4. Script writing 211

; range = Range

; Q1 = Q1

; Q2 = Q2

; Q3 = Q3

; IQR = IQR

; name4 = Entropy

; D = Discrepancy

; max = GlobalMax

; min = GlobalMin

; current_max = CurrentMax

; current_min = CurrentMin

; crossover_points = CrossPoints

; tournament_size = TourSize

; buffered = Buffered

; uniform_cross_probability = CrossUnifProb

; cross_probability = CrossProb

; mutation_probability = MutProb

;

; a = GlobalMinIndiv

; b = GlobalMaxIndiv

; c = CurrentMinIndiv

; d = CurrentMaxIndiv

;

; size = 2Êntropy

Section [PLOTS] defines the same parameters as page ‘Plots’ in dialog box ‘Op-tions’. First we must specify name of the window then character ‘=’ and finallyvariable name or equation. If we want to see more graphs in one window then weseparate our equations by means of ‘;’.

[PLOTS]

Convergence = Average; CurrentMin

; name1 = 2Êntropy

; iqr = Q3-Q1

quartiles = Q1; Q2; Q3

; name4 = PopSize; Different; 2Êntropy

Krzysztof Tesch - Politechnika Gdańskakrzyte/students/optimisation_book.pdf · Continuous...

Documents

Transcript of Krzysztof Tesch - Politechnika Gdańskakrzyte/students/optimisation_book.pdf · Continuous...