Financial Data Sources for Machine Learning and Trading Algorithms
Data Sources that are robust and reliable can be difficult to find for many domains, but for the financial markets there are quite a few to choose from. In this Article, I have collected various sources that can be easily used in any machine learning or algorithmic trading model.
The sources below fall into 3 basic categories: Python package, R package, and stand-alone data source. All 3 generally rely on some type of an API. This fact is generally not relevant given the level of abstraction of some of the implementations.
Financial Data Source Nav
Quick Overview of Data Source Differences
Depending on the type of ML model you are trying to develop and what answer you trying to get at, various factors will come into play. Here are a few things to consider for each of the data sources below to ensure you select the correct one for your purposes.
Time Periods
Considering which data source offers what level of fidelity in terms of time periods is important. Suffice it to say, the type of data needed for High Frequency is not widely available, but understanding if you are looking for hourly, daily or even monthly data to build your predictive model is a must.
As a rule of thumb, daily data can generally be downloaded for free, but higher fidelity time periods will likely cost some type of fee.File Format and API Access
Determining whether you need access to the data on a constant basis through an API (for example if you need up to date data regularly) or if you need the data in a stand-alone file is another determining factor.
Standalone files can often be found as CSVs. APIs usually can be accessed using simple Python or R packages. To make things more confusing, you can also use various APIs to retrieve CSVs or JSON files.Breadth of Data
The actual information that you are looking to create ML features out of also needs to be considered in advance. Some sources include information like Stock Price only, while others may include things like Trading Volume and things like adjusted prices that roll up dividend payments and stock splits.
Some data sources even have fundamental information gleaned from annual reports and balance sheets. Knowing what you need, and more importantly what you have access to, will give you an idea what source is right for you.
Python Packages for Financial Data
- PyNance – PyNance is a powerful Python package that can help you import both stock and options data with very simple commands. Like many packages, this one utilizes the Yahoo! Finance API. In addition to easily pulling data, it can be used to develop the features and labels necessary for a successful Machine Learning Model.
- QuantPy – This package is currently in development (as of 4 years ago) but does have the ability to access stock pricing information from Yahoo!. I included this package largely because it has such a high-profile name… however, given its development is in jeopardy you may want to look towards other packages first.
- tia – TIA means, ‘Toolkit for Integration and Analysis,’ and will allow you to access data from Bloomberg. This package focuses on being able to create PDF reports which could be useful when trying to visualize your portfolio’s performance.
- yFinance – As its name implies, this package allows you to pull data from Yahoo!. It should be noted that it is not an official package. This package is well documented, kept up to date, and can be easily used to create features with other data manipulation packages for eventual model generation.
- ffn – ffn, also known as ‘Financial Functions for Python,´ allows you to pull data from Yahoo! But then also gives you the ability to quickly access plots and statistics based on the time period retrieved for a specific ticker.
R Packages for Financial Data
- Quandl – This is one of the largest services that collects and publishes economic data. Numerous publishers including the Nasdaq provide their data for free as well as for a paid fee depending on the data requested.
- Quantmod – This is yet another very powerful package that allow you to access both financial data, tools to develop models, and even functions to help execute trades. This package aims toward allowing you to rapidly protype models for quick analysis and implementation. Data is made available from both Yahoo Finance and other sources such as FRED.
- lemonmarkets – if you are interested in using the markets API then this R package will get you well on your way. Although still in development, it appears to be continually updated by the programmer
. - simfinR – This package will allow you to pull in the standard stock pricing information but then goes a step further by providing data on specific fundamentals such as various items on the companies balance sheet. This package would be great to help add model features based on underlying fundamentals versus the technical information only provided by pricing data.
- Rbitcoin – For various cryptocurrency pricing data this package will help you interface the Unified Markets API. This package is a bit dated but should still provide the information needed for price-based features.
Stand Alone Financial Data
- Yahoo Finance – Once again Yahoo steals the show. They allow for historical data to be downloaded in the form of a CSV. Some of their data even goes all the way back to 1970. Once you select the stock symbol that you are looking for you can narrow your data selection by time-period and frequency of the sample.
- Intrinio – This paid service allows you to access both stock and options pricing data via bulk download as a CSV and through its API. Using Snowflake, it allows you to sift through a large amount of information quickly. So, if your model requires Realtime data then this may be the data source you are looking for.
- Algoseek – Another paid service, Algoseek allows you to buy, lease or even rent the data. This concept of data licensing is somewhat new to me, but it appears that their may be a significant value proposition here if your model does not need long term access to the underlying data (perhaps you are training a Neural Net of some type).
- io – This website has a number of datasets available as both a CSV and JSON. These datasets would be good for exploratory analysis on market pricing data. Success with these would likely hint towards a specific paid service that could ultimately provide more detailed and accurate data.
- EOD Historical Data – This site offers an API allowing you download both CSV and JSON formatted datasets. There is a free package that allows up to 20 API calls for day for data requests in the past year. For $79.99 you can make up to 100,000 data calls a day which should satisfy even the greediest data needs.
Summary
There are numerous packages for python and R for getting real-time and historical stock prices, fundamental data, and other useful information for the development of useful features. By combining a number of theses sources (although, not necessary) you should be able to develop about as sophisticated a Machine Learning model as possible without moving into High Frequency Trading.
If you are interested in getting your stock market data from Robinhood and subsequently make trades on that information I wrote on article on that topic using Python in the link above.
I hope you found this financial data resource page useful. If you have any additional sources, you would like to see added to the listing above just throw them down into the comments below.
Guy Money
As a formally trained Data Scientist I find excitement in writing about Personal Finance and how to view it through a lens filtered by data. I am excited about helping others build financial moats while at the same time helping to make the world a more livable and friendly place.