// guide
Backtesting Insider Trades on QuantConnect with Form4API
Updated 2026-06-15. By Theodor Nielsen, founder of Form4API.
// answer
You can backtest SEC Form 4 insider trades in QuantConnect by exporting them as CSV from Form4API's /v1/transactions/export endpoint and importing them as a custom data source in a LEAN algorithm. The one rule that matters: key every signal on its filing date (filedAt), not the transaction date, or your backtest leaks lookahead bias because a trade is not public until the Form 4 is filed.
QuantConnect has no built-in insider dataset
QC's data library covers price history, fundamentals, and a growing catalog of alternative datasets, but it has no SEC Form 4 insider transactions. You bring your own via LEAN's custom data API. Form4API's CSV export is built for exactly this use case: one flat file per ticker, in chronological order by filing date, with all the fields you need for a backtest.
The alternative is parsing raw EDGAR SGML yourself. That means chasing filing index pages, handling amended filings (Form 4/A), deduplicating derivative and non-derivative tables, and normalizing transaction codes, all before you can write a single line of strategy code. The export endpoint skips all of that.
Step 1: Export the transactions as CSV
Use GET /v1/transactions/export (Business plan or higher). It accepts the same filters as /v1/transactions: ticker, from/to date range, transaction codes, category, significant flag, and value/share bounds. The response streams CSV ascending by filedAt, capped at 100,000 rows per request. For full history, chunk by date, for example one calendar year at a time.
curl -H "X-Api-Key: $FORM4API_KEY" \ "https://api.form4api.com/v1/transactions/export?ticker=AAPL&from=2020-01-01&to=2024-12-31" \ -o aapl-insider.csv
The CSV has 27 columns. They include the post-trade return horizons (return1d, return1w, return1m, return3m, return6m), which are useful for research but must not be used as live features in a backtest (more on that in the caution section below). The one column that matters most for backtesting is filedAt.
The one rule: key on filedAt, not transactionDate
A Form 4 is filed up to two business days after the underlying trade. The trade is not knowable to the market until the Form 4 appears on EDGAR. Here is a concrete example from real AAPL data: a transaction dated 2026-02-01 did not appear on EDGAR until filedAt 2026-02-03. If your backtest acts on the transaction date, it is trading on information that was not yet public. That is the classic lookahead bias that makes a strategy look profitable in backtest and fail live.
Always key on filedAt. In LEAN, set the custom data point's time to filedAt so the engine only delivers the row on the date the trade became public. This is the most important thing to get right when backtesting any regulatory filing data, and it is easy to get wrong if you reach for the more intuitively named transactionDate column first.
The Form4API export is sorted ascending by filedAt, not by transactionDate, which reinforces which dimension is the authoritative timeline for public information.
Step 2: Define the LEAN custom data class
Subclass PythonData, point get_source at the export URL for the symbol, and parse each CSV line in reader, setting time to the filing date. The following is the current QuantConnect Python API:
from AlgorithmImports import *
from datetime import datetime, timedelta
class Form4Insider(PythonData):
def get_source(self, config, date, is_live_mode):
# One CSV per ticker, exported from Form4API (filedAt-ascending).
url = f"https://api.form4api.com/v1/transactions/export?ticker={config.symbol.value}&from=2015-01-01"
return SubscriptionDataSource(url, SubscriptionTransportMedium.REMOTE_FILE)
def reader(self, config, line, date, is_live_mode):
# Skip the header row and blanks.
if not line or line[0:7] == "filedAt" or not line.strip():
return None
cols = line.split(",")
row = Form4Insider()
row.symbol = config.symbol
# Column 0 is filedAt -- the point in time the trade became public.
# Keying Time on filedAt (not transactionDate, column 1) is what
# prevents lookahead bias.
row.time = datetime.strptime(cols[0], "%Y-%m-%d")
row.end_time = row.time + timedelta(days=1)
# transactionCode is column 10; isBuyOrSell is column 11.
code = cols[10]
is_buy = code == "P"
row.value = 1 if is_buy else -1
row["TransactionCode"] = code
row["Shares"] = float(cols[15]) if cols[15] else 0.0
return rowA note on key handling: the export endpoint requires authentication, so in practice you should download the CSV to QuantConnect's Object Store (or a hosted URL with the key embedded server-side) rather than calling the live API directly from get_source. LEAN custom-data URLs cannot carry secret request headers, so the code above shows the shape of the solution; swap the URL in get_source for a pre-signed Object Store path in your actual algorithm.
Step 3: Use it in an algorithm
In initialize, register the custom data per ticker alongside the equity subscription. In on_data, act when an insider buy becomes available: a positive value field means an open-market purchase (code P) just became public on the filing date.
class InsiderBuyAlgorithm(QCAlgorithm):
def initialize(self):
self.set_start_date(2020, 1, 1)
self.set_end_date(2024, 12, 31)
self.set_cash(100000)
self.tickers = ["AAPL", "NVDA", "MSFT"]
self.insider = {}
for t in self.tickers:
equity = self.add_equity(t, Resolution.DAILY).symbol
self.insider[t] = self.add_data(Form4Insider, t).symbol
def on_data(self, slice):
for t in self.tickers:
sym = self.insider[t]
if sym in slice and slice[sym].value > 0:
# An insider open-market buy just became public (filedAt). Go long.
self.set_holdings(t, 0.33)This is a minimal example. In production you would layer in quality filters: minimum dollar value per transaction, officer/director role filter, exclusion of 10b5-1 plan trades, and a position sizing model that accounts for the number of insider buys in the same filing period. The Form4API export exposes all of those fields in the same flat row, so you can implement each filter directly in reader or in on_data.
A caution on the return columns
The export includes return1d, return1w, return1m, return3m, and return6m columns. These are computed after the fact, for research and labeling. They answer the question: given that this insider bought on this date, how did the stock perform over the following period?
Do not feed them into a backtest as live features. They are forward-looking by construction: the return1m column for a trade filed on 2024-01-15 contains the return up to 2024-02-15. Your backtest running on 2024-01-15 cannot know that value. Using it would be the worst kind of lookahead bias, one that is invisible in the code but catastrophic in the result.
The right use for these columns is offline analysis: study which insider signals (by role, transaction code, dollar value, or sector) historically preceded strong forward returns. Then build your live backtest signal from the point-in-time fields only: filedAt, transactionCode, shares, value.
What is supported today
Form 4 transactions are the supported export today. They key cleanly to a ticker because every Form 4 filing names the issuer by ticker and CIK. The 13F institutional holdings export is not yet available: only about a third of 13F holdings CUSIPs map cleanly to a ticker, so the data is not ready for the same flat-file treatment.
Pull a Business key from the dashboard and read the export reference in the API docs to see the full list of filter parameters and column definitions.
Frequently asked questions
Does QuantConnect have a built-in insider trading dataset?
No. QC has price, fundamental, and some alternative datasets, but no SEC Form 4 insider transactions. You add them yourself with LEAN's custom data API, fed by Form4API's CSV export.
How do I avoid lookahead bias when backtesting insider trades?
Key every signal on filedAt, the date the Form 4 was filed on EDGAR and the trade became public, not transactionDate. A Form 4 is filed up to two business days after the trade. In LEAN, set your custom data point's time to filedAt so the engine only delivers it on the disclosure date.
What Form4API plan do I need for the CSV export?
The bulk CSV export (/v1/transactions/export) requires the Business plan or higher. The paginated JSON endpoint (/v1/transactions) is available on Free and Pro.
How much history can I export in one request?
Up to 100,000 rows per request. For longer spans, pass a from/to window and pull in chunks, for example one calendar year at a time. The export streams ascending by filedAt.
Can I backtest 13F institutional holdings the same way?
Not yet. The holdings export is not available because only about a third of 13F holdings CUSIPs map cleanly to a ticker. Form 4 transactions, which key directly to a ticker, are the supported export today.
Are the post-trade return columns safe to use as backtest features?
No. The return1d through return6m columns are computed after the trade for research and labeling and are forward-looking by construction. Using them as live features is lookahead bias. Use only point-in-time fields (filedAt, transactionCode, shares, value) in a backtest, and use the return columns to study historical signal quality.