VoxEU Blog/Review Frontiers of economic research

# Which numerical computing language is best: Julia, MATLAB, Python or R?

Julia, MATLAB, Python and R are among the most commonly used numerical programming languages by economic researchers. In this post, Jon Danielsson and Jia Rong Fan compare and contrast these four, reaching a very subjective conclusion as to which is best and which is worst.

A large number of general-purpose numerical programming languages are used by economic researchers. We suspect the most common are MATLAB, Python and R, with Julia increasingly used, helped by Thomas Sargent's endorsement.

This naturally invites the question: which of these is the best?

This is of course highly subjective — depending on the objective, any of these four could be the best choice.

That said, we have specific criteria in mind. One of us has written a book called Financial Risk Forecasting, where risk forecasting methods are implemented in MATLAB and R. The other has recently translated all that code into Julia and Python, all downloadable.

Our starting criteria is how easy it was to implement the algorithms in Financial Risk Forecasting, followed by six others.

## 1. Implementing the Financial Risk Forecasting algorithms

The published book and the accompanying website used R and MATLAB. All required functionality was available, either through built-in methods or from outside libraries. We have built much larger projects with both, never running into any serious language limitations.

We could do most things in Python using NumPy (numerical Python), but it was not trouble-free. Some of the available library code was a bit dodgy, like GARCH estimation which had convergence issues, and there was no code for multivariate GARCH or more fancy specifications.

With Julia, it was harder to find off-the-shelf libraries. When they existed, it was often unclear which package to use and how to use it. For instance, StatsFuns.jl and Distributions.jl both carry out statistical calculations, but the former does not support vectorisation and has minimal documentation — the uninitiated would not know that StatsFuns.jl was not meant for end-users. There was only one functioning univariate GARCH(1,1) package, with no support for a general GARCH(p,q) or a Student's t conditional distribution. Needless to say, multivariate GARCH was also unavailable.

So in terms of implementing the risk forecasting code, R and MATLAB are the winners, with Julia lagging far behind.

## 2. Language features

R and MATLAB first originated in the 1970s and their age shows. Since then, they have evolved erratically.

Some R functions are inconsistent and exhibit problematic behaviour, as shown by the R Inferno.

MATLAB also has its share of undesirable characteristics. For example, its matrix access uses the same bracket type ( ) as function calls, making the code harder to read. The other three use [ ] and ( ), avoiding this problem and minimising errors. MATLAB functions either have to be at the end of the source files or in separate files.

They are neither type safe nor equipped with proper namespaces, and their packages often override function names leading to errors that are hard to diagnose. R supports limited object-oriented programming, while MATLAB's object-oriented operations have improved after its 2015b update.

Python is 20 years younger and it is great at what it was designed for (e.g. file processing). It does objects well. For numerical programming, two additional packages are used — pandas for data structures and NumPy for computations. While both of these are powerful, neither look like they naturally fit into Python. For instance, while data structures should ideally look and behave the same way, pandas and NumPy data structures often have to be converted when moving from one package to the other.

Common calculations (that use natural operations in other languages) often require lengthy function calls in Python. For example, Matrix power is

A^2

in MATLAB and

np.linalg.matrix_power(A,2)

in Python.

Julia is the newcomer and it shows, incorporating state-of-the-art language design features. Unlike the other three, one can optionally use type declarations, and multiprocessor calculations are more natural than the others. Object orientation is built in, and multiple dispatch is central to its language design.

It also allows Unicode characters in equations, so one can have code with Greek and other characters, like

ß=2

or

Ω+π+æ-∞

Hence in terms of language features, Julia is the clear winner, with R, MATLAB and Python far behind.

## 3. Speed

R, MATLAB and Python are interpreted languages, which by nature incur more processing time. While all now offer just-in-time (JIT) compilation, it may not always help much. Iterative loops are especially slow. The idea behind MATLAB is that this should not really matter, because it was designed for linear algebra, functioning as a front-end to numerical libraries programmed in FORTRAN or C. The same applies to R to a lesser extent.

Both languages use a variety of tricks to speed up computation, offloading common calculations to libraries in C or FORTRAN. If that fails, one can just code up C/C++/FORTRAN within these languages. However, from an implementation point of view, the problem is that all these tricks make the languages more complicated.

The same applies to Python. Cython is commonly used to speed up performance considerably by running portions of the code in C. One can use Numba, a JIT compiler involving minimal additional code. However, it can only be used in certain simple cases. For example, it does not support class definitions and exceptions.

Julia, with just-in-time compiling, promises to be as fast as FORTRAN or C. The user does not have to implement tricks to speed up the code, so the language becomes simpler and easier to programme.

To compare the speed of these languages, we implemented a simple iterative calculation in each. For reference, an implementation in C was also included. The calculation is the iterative loop for log-likelihood computation in a GARCH(1,1) model for a dataset of length 10,000. We repeated the calculation 1,000 times and recorded the best runtime in the following figure.

When it comes to calculating GARCH likelihood, R is the slowest and Python the fastest, with Julia not far behind.

The speed advantage given by Numba to Python might not extend to more complex projects, were Julia is likely to be faster as argued by Christopher Rackauckas.

An expanded discussion of the speed comparison is available in our web appendix. For an alternative comparison, see Aruoba and Fernandez-Villaverde’s performance comparison

## 4. Data handling

A lot of research involves large data sets, often in a variety of different data types such as integers, strings, reals, dates, logicals or lists. Processing such data may require filtering and transformation operations.

Data is often read from and written to a number of formats, including text files, CSV files, Excel, SQL databases, noSQL databases and proprietary data formats, either local or remote.

This is where R absolutely shines. It was designed for scientific data, and it shows. It can handle complicated data structures with a variety of formats and origins, with many packages that provide a variety of ways to access and process the data. It can handle data sets that are much bigger than what can fit into memory.

Python is also quite good at this, with its pandas and NumPy libraries able to do many of the same things including some which R cannot do. But it does not seem as fluid as R. NumPy arrays lack column names, which makes data retrieval less convenient. Numerical programming requires subsetting and changing elements in data structures quickly and efficiently. When using pandas, accessing and changing elements require special syntax like .iloc /.loc and often explicit type conversion from pandas dataseries to NumPy arrays and back.

For example, to access an element in DataFrame M, one may have to use

M.iloc[1,:].values[1]

It is much simpler in R

M[2,2]

MATLAB has improved in terms of its supporting different data types in recent updates, with different table types for heterogeneous data and categorical arrays. It has import functions for most common file types.

Julia's handling of data is lacking in terms of file types and options supported at present. Moreover, some packages are still going through reorganisation, like the CSV and DataFrames packages for importing CSV files.

So, when it comes to data handling, Julia is the worst, followed by MATLAB and Python, with R being the winner.

## 5. Libraries

Each of these four languages provides a basic infrastructure, but a lot of specialised functionality is offloaded to external libraries.

MATLAB was designed as a numerical language and has a lot of useful functions built in. Moreover, its available libraries are very rich, especially for numerous engineering applications (e.g. signal processing).

R is even better: there is probably a library for almost any statistical functionality one could possibly use. The downside is that some of these are of low quality or are badly documented, and there might be multiple libraries for the same functionality, often with different argument specifications and output types.

Python has a lot of libraries available, but not nearly as many as either R or MATLAB.

Julia, being the newcomer, has the fewest libraries by far.

So in terms of libraries, Julia is worst, followed by Python and MATLAB, with R the best.

That said, Python, Julia and R can all call functions from each other. Thus, libraries in one can be used in all, mitigating the problem somewhat. While this can be useful in special circumstances, it is more natural and stable to just work in one language.

## 6. Licensing

Three of these languages (Julia, Python and R) are open source, while MATLAB is commercial. For pricing see here. This means that the first three are available on almost any platform and one can install them without paying or getting permission.

Heavy computations often get outsourced to either high performance computing clusters or the cloud. We can rent a 72-core machine on Amazon Cloud for \$1.16 an hour, making that 20 times faster than most desktops. For MATLAB, one needs to purchase the Parallel Computing Toolbox and pay \$0.18 (\$0.07 educational) per core per hour (see here). Moreover, that requires considerable time to set up.

Hence in terms of licensing and cost, MATLAB is worst, and the other three equal.

## 7. Ease of use

However, when it comes to ease of use, MATLAB has a good integrated development environment (IDE), the MATLAB desktop, with very good documentation.

R has come a long way, with the RStudio IDE even better than the MATLAB desktop. Shiny allows interactive web apps and dashboards to be built directly from R, providing online-friendly means of data presentation.

R has good plotting functionality, with MATLAB not far behind.

Python's Anaconda distribution bundles a good IDE, Spyder. Plots are mainly done through Matplotlib, with an interface similar to MATLAB's.

Juno for Julia is an IDE integrated with the Atom editor which looks and functions like Spyder. That said, we occasionally experienced teething issues, like error feedback failing to identify the exact source of error. Plots.jl is used for plotting, often relying on packages from other languages.

Being rather new, commonly used packages in Julia are still undergoing changes from time to time. This has resulted in incomplete or sparse documentation. On many occasions, while translating code from R/MATLAB to Julia, we had to look up the source code to figure out the required settings (if they even existed in the first place). Moreover, many packages still use deprecated subroutines, with frequent warnings popping up when executed.

All four could be used in Jupyter notebooks. A Jupyter notebook implementation of the code from Financial Risk Forecasting is available here. However, while Jupyter notebooks are certainly useful for demonstration and pedagogical purposes, we do not think they are the best environment for day-to-day programming.

Thus, in terms of ease of use, especially for novice users, MATLAB is the best. R and Python trail behind slightly, with Julia having some way to go.

## Summary

None of these four languages leads on all evaluation criteria.

R and MATLAB benefit from being the veterans, one can do almost anything one wants with them. However, their age shows: the languages are outdated, with considerable baggage and inefficiencies.

Python is more modern, but its libraries are lacking in comparison and numerical programming is clumsy.

So, what about Julia? It is a modern language, very elegant and fast. What it lacks at present is comprehensive library support for data handling and numerical calculations. Julia has been under heavy development, however, version 1.0 was recently released bringing with it feature stability, making it safer to use Julia for long-term projects.

So is there a clear ranking?

Recognising that this assessment is highly subjective: For our purposes, R is the best numerical language.

With Julia the one to look out for.