Researchers in economics and finance looking for a modern general purpose programming language have four choices – Julia, MATLAB, Python, and R.
We have compared these four languages twice before here on Vox (Danielsson and Fan 2018, Aguirre and Danielsson 2020). Still, as all four are in active development, the landscape has changed considerably since the last time, so it is worthwhile revisiting the question.
Three of these four – MATLAB, Python, and R – date back several decades, bringing advantages and problems. They are mature but also suffer from incremental changes over the years, so they can be archaic, inconsistent and slow. Julia is modern, carrying none of the baggage of the other three, but at the cost of less maturity and familiarity. Not surprisingly, it has been adopted in high quality projects, such as Quantitative Economics with Julia, popularised by Thomas Sargent (Perla et al., 2022).
This leaves the question of why we are not discussing Stata, which might be the most used statistical programme in economics, one we have used extensively. The reason is that while Stata offers much better algorithms than any of the four languages, users engage with algorithms written by Stata rather than writing their own. It is not a general-purpose programming language, our interest in this piece. One could conceivably do everything Stata offers in one of the four languages while the reverse is certainly not true.
We start by evaluating computation speed, with all code available on the web appendix at https://modelsandrisk.org/appendix/speed_2022.
The first comparison is the calculation of a GARCH log-likelihood function. It is iterative and non-vectorisable, with a nontrivial computation time, making it an excellent test for speed.
We use all four languages in their standard forms, and for Python, we also consider the just-in-time compiler package Numba, with significantly speeds up Python calculations in those specific cases where it can be used. When using Julia, we use two versions, standard and without bonds checking, @inbounds. We normalise all results to a pure C implementation to establish a speed baseline.
Figure 1 GARCH log-likelihood speed
As expected, C is the fastest, followed by Python with Numba, Julia, MATLAB, R and pure Python. Compared to the same experiment run in 2020, MATLAB has become slower and pure Python faster, while R and Julia have the same speed.
For our second experiment, we look at the loading speed for a large CSV file, both compressed (600 MB) and uncompressed (over 3 GB). The specific data is all stocks in the CRSP database from 1928 until March 2022.
Figure 2 Loading a large data file
Just like we found in 2020, R is the fastest for both compressed and uncompressed files, followed by Julia, and then Python with MATLAB significantly slowest. MATLAB does not support loading compressed files.
In our final timing experiment, we use the CRSP file loaded above, and calculate the standard deviation of returns for each year and stock (see Figure 3).
Figure 3 Large data set calculation
Julia is still the fastest, and is now relatively faster than in 2020. Python has moved up a place, at the expense of R. MATLAB stays significantly behind with a worse relative time than in 2020.
These findings are in line with results of Arouba (2018), Coleman et al. (2021) and Markwick (2022).
Programmers increasingly rely on community support in their work. While the Stack Overflow website is the most popular, other more specific websites are also beneficial. We look at the community support from several directions, especially the number of questions on Stack Overflow and the number of public repositories for each language on GitHub. All four languages have a vibrant online community, helping researchers. Python has the largest community, benefiting from its widespread use outside of the type of scientific computations considered here. R also has a large community, followed by MATLAB and finally Julia.
Learning and using the languages
The four languages come with varying amounts of learning materials.
Here, MATLAB has the advantage. Its documentation is the best in class. It is easily searchable and emphasises practical applications. R’s documentation is also excellent but is more confusing and inconsistent and can be hard to navigate.
Python’s numerical programming documentation is decidedly inferior. It is more focused on computer science theory and less on applications, making it hard to navigate.
Julia’s documentation is the worst. Some parts are excellent, but by and large would profit from focusing on practical uses of code instead of computer science arcana.
MATLAB and R benefit from excellent integrated development environments, the MATLAB desktop and RStudio, while Julia and Python require programmers to use a general-purpose editor and then separately access the language environments.
That said, one can develop in a powerful browser-based environment, Jupyter, in all four languages. However, Jupyter is best suited for the smallest projects
Language and syntax
Three of the languages – MATLAB, Python and R – suffer from being developed over many decades, with language features added incrementally and inconsistently. In addition, Python has the disadvantage of initially being designed for other uses, with numerical programming only added later. As a result, its programming syntax for numerical programming is inferior to the other three, and inconsistent with Python generally and what one might expect in a numerical programming language (Driscoll 2019).
Being conceived as a modern numerical programming language, Julia has a clean and consistent syntax, so it has the fastest programming speed with the fewest errors, as things generally work as one might expect.
All four languages have the necessary core functionality for numerical programming, but one always needs libraries for serious use.
Python’s library support for economics and finance applications is the worst of the four, with two important exceptions – machine learning and data pipelines – which are best in class.
The libraries supplied by MATLAB’s commercial vendor are generally excellent, but they are limited and, because of its commercial nature, there are very few outside libraries available, and the one we use no longer works because of changes to MATLAB core syntax. MATLAB users are, therefore, much more dependent on coding up their own libraries than in other languages.
Julia has a disadvantage in being a recent language, and its library ecosystem, while growing rapidly, is more immature than that of Python and R.
The library support in R for economic and finance applications is by far the best. It benefits from decades of use, and researchers who release computational libraries, overwhelmingly prefer R.
That said, three of these languages – Python, R and Julia – can easily run code in a different language. So within the same source file, one could use Python for data handling, R for plotting and Julia for fast computations. We have done so in several applications, and it works quite well.
Researchers often depend on code written years, even decades ago. Revise and resubmit cycles can be very long, and the research teams in central banks and other institutions need to run the same analysis on new data over many years.
Consequently, backward compatibility, that is, whether the same code will run for a long time as languages and libraries evolve, is of considerable benefit. It is risky and costly to rewrite existing code every couple of years because of language changes.
There is a considerable difference in backward compatibility in these four languages.
The worst offender is Python, especially the key libraries NumPy and Pandas. We have experienced repeated cases where code recommended at one point is depreciated and will not run one or two years later. This can lead to hard to diagnose and fix bugs. Perhaps Python’s biggest problem is dependency management, that is, how to handle different versions of Python and libraries. Because of how intimately specific libraries are tied to particular Python releases, and frequent code breaking changes, one may need to manage multiple Python and library versions simultaneously, a non-trivial undertaking.
MATLAB frequently changes their language, even commonly used core functions. While MATLAB provides a toolbox allowing programmers to update code as versions change, it is no substitute for code stability. Because of its commercial it may be impossible to run multiple versions on the same computer.
Julia promises backward compatibility for core functions. While that guarantee does not extend to all libraries, some key libraries make similar commitments, and we expect Julia's backward compatibility to be excellent. This, however, remains untested. Julia has dependency management facilities that work pretty well but are poorly documented and hard to use.
Backward compatibility in R has been excellent, and we routinely run code written a decade ago or longer with no issues. R does not provide facilities for dependency management.
Docker is generally the best way to get backwards compatibility and to ensure reproducible results, regardless of language.
All four languages are fast enough for most applications, while time critical code is often written in C or Fortran. Python is by far the slowest of the four languages, but one can use Numba in specific cases to make it the fastest. Julia is generally the fastest, so for most researchers who need speed and write their own code, Julia is the language we would recommend.
All four languages come with excellent parallel computing facilities, with Python and especially Julia the best of the four. One can significantly benefit from using the GPU for computations in special cases. All four languages easily support GPU programming.
We cannot make any general recommendation as to the best numerical programming language. They are all excellent, and if one is particularly familiar with one of them, there is usually no reason to switch.
However, when starting out, or in particular applications, one of these languages is generally the best.
The one hardest to recommend is MATLAB. Not only is it very expensive, it is slow and has the worst library support. We can only see MATLAB as useful if one is already working on projects that use MATLAB.
Python is the best language for data pipelines and machine learning applications but not otherwise.
R is the best overall language. It has by far the best library support and, while slow, one can overcome that by embedded C++ code. However, R is archaic and inconsistent, resulting in hard to diagnose bugs, as language design decisions made decades ago hamper R today.
Julia is best from a pure language perspective. It does not have any historical baggage, and the language is clean and modern. It is by far the fastest of the four. Its weakness is library support and documentation. We recommend Julia for those writing their own code to solve complex, time-consuming problems.
Aguirre, A and Danielsson, J (2020), “Which programming language is best for economic research: Julia, Matlab, Python or R?”, VoxEU.org, 20 August.
Arouba, S and J Fernández-Villaverde (2018), “A Comparison of Programming Languages in Economics: An Update”.
Coleman, C, S Lyon, L Maliar et al (2021), “Matlab, Python, Julia: What to Choose in Economics?”, Computational Economics 58:1263–1288.
Danielsson, J and J R Fan (2018), “Which numerical computing language is best: Julia, MATLAB, Python or R?”, VoxEU.org, 9 July.
Driscoll, T (2019), “Matlab vs. Julia vs. Python”.
Markwick, D (2022), “Fitting Mixed Effects Models - Python, Julia or R?”, juliabloggers.
Perla, J, T Sargent and J Stachurski (2022), “Quantitative Economics with Julia”.