Skip to main content
main-content
Top

About this book

Numerical Python by Robert Johansson shows you how to leverage the numerical and mathematical modules in Python and its Standard Library as well as popular open source numerical Python packages like NumPy, FiPy, matplotlib and more to numerically compute solutions and mathematically model applications in a number of areas like big data, cloud computing, financial engineering, business management and more.

After reading and using this book, you'll get some takeaway case study examples of applications that can be found in areas like business management, big data/cloud computing, financial engineering (i.e., options trading investment alternatives), and even games.

Up until very recently, Python was mostly regarded as just a web scripting language. Well, computational scientists and engineers have recently discovered the flexibility and power of Python to do more. Big data analytics and cloud computing programmers are seeing Python's immense use. Financial engineers are also now employing Python in their work. Python seems to be evolving as a language that can even rival C++, Fortran, and Pascal/Delphi for numerical and mathematical computations.

Table of Contents

Chapter 1. Introduction to Computing with Python

Abstract
This book is about using Python for numerical computing. Python is a high-level, general-purpose interpreted programming language that is widely used in scientific computing and engineering. As a general-purpose language, Python was not specifically designed for numerical computing, but many of its characteristics make it well suited for this task. First and foremost, Python is well known for its clean and easy-to-read code syntax. Good code readability improves maintainability, which in general results in less bugs and better applications overall, but it also encourages rapid code development. This readability and expressiveness is essential in exploratory and interactive computing, which requires fast turnaround for testing various ideas and models.
Robert Johansson

Chapter 2. Vectors, Matrices, and Multidimensional Arrays

Abstract
Vectors, matrices, and arrays of higher dimensions are essential tools in numerical computing. When a computation must be repeated for a set of input values, it is natural and advantageous to represent the data as arrays and the computation in terms of array operations. Computations that are formulated this way are said to be vectorized. Many modern processors provide instructions that operate on arrays. These are also known as vectorized operations, but here vectorized refers to high-level array-based operations, regardless of how they are implemented at the processor level. Vectorized computing eliminates the need for many explicit loops over the array elements by applying batch operations on the array data. The result is concise and more maintainable code, and it enables delegating the implementation of (for example, elementwise) array operations to more efficient low-level libraries. Vectorized computations can therefore be significantly faster than sequential element-by-element computations. This is particularly important in an interpreted language such as Python, where looping over arrays element-by-element entails a significant performance overhead.
Robert Johansson

Chapter 3. Symbolic Computing

Abstract
Symbolic computing is an entirely different paradigm in computing compared to the numerical array-based computing introduced in the previous chapter. In symbolic computing software, also known as computer algebra systems (CASs), representations of mathematical objects and expressions are manipulated and transformed analytically. Symbolic computing is mainly about using computers to automate analytical computations that can in principle be done by hand with pen and paper. However, by automating the bookkeeping and the manipulations of mathematical expressions using a computer algebra system, it is possible to take analytical computing much further than can realistically be done by hand. Symbolic computing is a great tool for checking and debugging analytical calculations that are done by hand, but more importantly it enables carrying out analytical analysis that may not otherwise be possible.
Robert Johansson

Chapter 4. Plotting and Visualization

Abstract
Visualization is a universal tool for investigating and communicating results of computational studies, and it is hardly an exaggeration to say that the end product of nearly all computations – be it numeric or symbolic – is a plot or a graph of some sort. It is when visualized in graphical form that knowledge and insights can be most easily gained from computational results. Visualization is therefore a tremendously important part of the workflow in all fields of computational studies.
Robert Johansson

Chapter 5. Equation Solving

Abstract
In the previous chapters we have discussed general methodologies and techniques, namely array-based numerical computing, symbolic computing, and visualization. These methods are the cornerstones of scientific computing that make up a fundamental toolset we have at our disposal when attacking computational problems.
Robert Johansson

Chapter 6. Optimization

Abstract
In this chapter, we will build on Chapter 5 about equation solving, and explore the related topic of solving optimization problems. In general, optimization is the process of finding and selecting the optimal element from a set of feasible candidates. In mathematical optimization, this problem is usually formulated as determining the extreme value of a function of a given domain. An extreme value, or an optimal value, can refer to either the minimum or maximum of the function, depending on the application and the specific problem. In this chapter we are concerned with optimization of real-valued functions of one or several variables, which optionally can be subject to a set of constraints that restricts the domain of the function.
Robert Johansson

Chapter 7. Interpolation

Abstract
Interpolation is a mathematical method for constructing a function from a discrete set of data points. The interpolation function, or interpolant, should exactly coincide with the given data points, and it can also be evaluated for other intermediate input values within the sampled range. There are many applications of interpolation: A typical use-case that provides an intuitive picture is the plotting of a smooth curve through a given set of data points. Another use-case is to approximate complicated functions, which, for example, could be computationally demanding to evaluate. In that case, it can be beneficial to evaluate the original function only at a limited number of points, and use interpolation to approximate the function when evaluating it for intermediary points.
Robert Johansson

Chapter 8. Integration

Abstract
In this chapter we cover different aspects of integration, with the main focus on numerical integration. For historical reasons, numerical integration is also known as quadrature. Integration is significantly more difficult than its inverse operation – differentiation – and while there are many examples of integrals that can be calculated symbolically, in general we have to resort to numerical methods. Depending on the properties of the integrand (the function being integrated) and the integration limits, it can be easy or difficult to numerically compute an integral. Integrals of continuous functions and with finite integration limits can in most cases be computed efficiently in one dimension, but integrable functions with singularities or integrals with infinite integration limits are examples of cases that can be difficult to handle numerically, even in a single dimension. Double integrals and higher-order integrals can be numerically computed with repeated single-dimension integration, or using methods that are multidimensional generalizations of the techniques used to solve single-dimensional integrals. However, the computational complexity grows quickly with the number of dimensions to integrate over, and in practice such methods are only feasible for low-dimensional integrals, such as double integrals or triple integrals. Integrals of higher dimension than that often require completely different techniques, such as Monte Carlo sampling algorithms.
Robert Johansson

Chapter 9. Ordinary Differential Equations

Abstract
Equations wherein the unknown quantity is a function, rather than a variable, and that involve derivatives of the unknown function, are known as differential equations. An ordinary differential equation is the special case where the unknown function has only one independent variable with respect to which derivatives occur in the equation. If, on the other hand, derivatives of more than one variable occur in the equation, then it is known as a partial differential equation, and that is the topic of Chapter 11. Here we focus on ordinary differential equations (in the following abbreviated as ODEs), and we explore both symbolic and numerical methods for solving this type of equations in this chapter. Analytical closed-form solutions to ODEs often do not exist, but for many special types of ODEs there are analytical solutions, and in those cases there is a chance that we can find solutions using symbolic methods. If that fails, we must, as usual, resort to numerical techniques.
Robert Johansson

Chapter 10. Sparse Matrices and Graphs

Abstract
We have already seen numerous examples of arrays and matrices being the essential entities in many aspects of numerical computing. So far we have represented arrays with the NumPy ndarray data structure, which is a heterogeneous representation that stores all the elements of the array that it represents. In many cases, this is the most efficient way to represent an object such as a vector, matrix, or a higher-dimensional array. However, notable exceptions are matrices where most of the elements are zeros. Such matrices are known as sparse matrices, and they occur in many applications, for example, in connection networks (such as circuits) and in large algebraic equation systems that arise, for example, when solving partial differential equations (see Chapter 11 for examples).
Robert Johansson

Chapter 11. Partial Differential Equations

Abstract
Partial differential equations (PDEs) are multivariate different equations where derivatives of more than one dependent variable occur. That is, the derivatives in the equation are partial derivatives. As such they are generalizations of ordinary differentials equations, which were covered in Chapter 9. Conceptually, the difference between ordinary and partial differential equations is not that big, but the computational techniques required to deal with ODEs and PDEs are very different, and solving PDEs is typically much more computationally demanding. Most techniques for solving PDEs numerically are based on the idea of discretizing the problem in each independent variable that occurs in the PDE, and thereby recasting the problem into an algebraic form. This usually results in very large-scale linear algebra problems. Two common techniques for recasting PDEs into algebraic form is the finite-difference methods (FDMs), where the derivatives in the problem are approximated with their finite-difference formula; and the finite-element methods (FEMs), where the unknown function is written as linear combination of simple basis functions that can be differentiated and integrated easily. The unknown function is described by a set of coefficients for the basis functions in this representation, and by a suitable rewriting of the PDEs we can obtain algebraic equations for these coefficients.
Robert Johansson

Chapter 12. Data Processing and Analysis

Abstract
In the last several chapters we have covered the main topics of traditional scientific computing. These topics provide a foundation for most computational work. Starting with this chapter, we move on to explore data processing and analysis, statistics, and statistical modeling. As a first step in this direction, we look at the data analysis library pandas. This library provides convenient data structures for representing series and tables of data, and makes it easy to transform, split, merge, and convert data. These are important steps in the process.
Robert Johansson

Chapter 13. Statistics

Abstract
Statistics has long been a field of mathematics that is relevant to practically all applied disciplines of science and engineering, as well as business, medicine, and other fields where data is used for obtaining knowledge and making decisions. With the recent proliferation of data analytics there has been a surge of renewed interest in statistical methods. Still, computer-aided statistics has a long history, and it is a field that traditionally has been dominated by domain-specific software packages and programming environments, such as the S language, and more recently its open source counterpart: the R language. The use of Python for statistical analysis has grown rapidly over the last several years, and by now there is a mature collection of statistical libraries for Python. With these libraries Python can match the performance and features of domain-specific languages in many areas of statistics, albeit not all, while also providing the unique advantages of the Python programming language and its environment. The pandas library that we discussed in Chapter 12 is an example of a development within the Python community that was strongly influenced by statistical software, with the introduction of the data frame data structure to the Python environment. The NumPy and SciPy libraries provides computational tools for many fundamental statistical concepts, and higher-level statistical modeling and machine learning are covered by the statsmodels and scikit-learn libraries, which we will see more of in the following chapters.
Robert Johansson

Chapter 14. Statistical Modeling

Abstract
In the previous chapter we covered basic statistical concepts and methods. In this chapter we build on the foundation laid out in the previous chapter and explore statistical modeling, which deals with creating models that attempt to explain data. A model can have one or several parameters, and we can use a fitting procedure to find the values of the parameter that best explains the observed data. Once a model has been fitted to data, it can be used to predict the values of new observations, given the values of the independent variables of the model. We can also perform statistical analysis on the data and the fitted model, and try to answer questions such as if the model accurately explains the data, which factors in the model is more relevant (predictive) than others, and if there are parameters that do not contribute significantly to the predictive power of the model.
Robert Johansson

Chapter 15. Machine Learning

Abstract
In this chapter we explore machine learning. This topic is closely related to statistical modeling, which we considered in Chapter 14, in the sense that both deal with using data to describe and predict outcomes of uncertain or unknown processes. However, while statistical modeling emphasizes the model used in the analysis, machine learning side steps the model part and focuses on algorithms that can be trained to predict the outcome of new observations.
Robert Johansson

Chapter 16. Bayesian Statistics

Abstract
In this chapter we explore an alternative interpretation of statistics – Bayesian statistics – and the methods associated with this interpretation. Bayesian statistics, in contrast to the frequentist’s statistics that we used in Chapter 13 and Chapter 14, treat probability as a degree of belief rather than as a measure of proportions of observed outcomes. This different point of view gives rise to distinct statistical methods that can be used in problem solving. While it is generally true that statistical problems can in principle be solved using either frequentist or Bayesian statistics, there are practical differences that make these two approaches to statistics suitable for different types of problems
Robert Johansson

Chapter 17. Signal Processing

Abstract
In this chapter we explore signal processing, which is a subject with applications in diverse branches of science and engineering. A signal in this context can be a quantity that varies in time (temporal signal), or as a function of space coordinates (spatial signal). For example, an audio signal is a typical example of a temporal signal, while an image is a typical example of a spatial signal in two dimensions. In reality, signals are often continuous functions, but in computational applications it is common to work with discretized signals, where the original continuous signal is sampled at discrete points with uniform distances. The sampling theorem gives rigorous and quantitative conditions for when a continuous signal can be accurately represented by a discrete sequence of samples.
Robert Johansson

Chapter 18. Data Input and Output

Abstract
In nearly all scientific computing and data analysis applications there is a need for data input and output, for example, to load datasets or to persistently store results. Getting data in and out of programs is consequently a key step in the computational workflow. There are many standardized formats for storing structured and unstructured data. The benefits of using standardized formats are obvious: you can use existing libraries for reading and writing data, saving yourself both time and effort. In the course of working with scientific and technical computing, it is likely that you will face a variety of data formats through interaction with colleagues and peers, or when acquiring data from sources such as equipment and databases. As a computational practitioner, it is important to be able to handle data efficiently and seamlessly, regardless of which format it comes in. This motivates why this entire chapter is devoted to this topic.
Robert Johansson

Chapter 19. Code Optimization

Abstract
In this book we have explored various topics of scientific and technical computing using Python and its ecosystem of libraries. As touched upon in the very first chapter of this book, the Python environment for scientific computing generally strikes a good balance between a high-level environment that is suitable for exploratory computing and rapid prototyping – which minimizes development efforts – and high-performance numerics – which minimize application run times. High-performance numerics is achieved not through the Python language itself, but rather through leveraging libraries that contain or use external compiled code, typically written in C or in Fortran. Because of this, in computing applications that rely heavily on libraries such as NumPy and SciPy, most of the number crunching is performed by compiled code, and the performance is therefore vastly better than if the same computation were to be implemented purely in Python.
Robert Johansson

Appendix A. Installation

Abstract
This Appendix covers the installation and setup of a Python environment for scientific computing on commonly used platforms. As discussed in Chapter
Robert Johansson
Additional information