How to Crawl DART Electronic Disclosure System for Stock Quant Investing

Table of Contents

  • 1. Install Required Libraries
  • 2. Example of DART Disclosure Search and Financial Statement Crawling
  • 3. Code Explanation
This post is part of the Coupang Partners Program and may contain affiliate links, for which I may receive a commission.

How to Crawl DART Electronic Disclosure System for Stock Quant Investing

KissCuseMe
2025-03-11
2

I will explain step by step how to crawl financial statement data from the DART Electronic Disclosure System. We will use the requests, BeautifulSoup, and pandas libraries in Python. However, please be aware of website structure changes when actually using it, and be careful as excessive requests can put a strain on the server.


1. Install Required Libraries

pip install requests beautifulsoup4 pandas openpyxl

2. Example of DART Disclosure Search and Financial Statement Crawling

import requests
from bs4 import BeautifulSoup
import pandas as pd
from urllib.parse import urljoin

<br/>

# Search Criteria (e.g., Samsung Electronics (005930) Annual Report)
COMPANY_CODE = "005930"  # Stock Code
START_DATE = "20230101"  # Search Start Date (YYYYMMDD)
END_DATE = "20231231"    # Search End Date (YYYYMMDD)
REPORT_TYPE = "A001"     # A001: Annual Report, A002: Semi-Annual Report, A003: Quarterly Report

<br/>

# DART Disclosure Search URL
SEARCH_URL = "http://dart.fss.or.kr/dsab001/search.ax"

def get_report_list():
    """Function to fetch the list of DART disclosure reports"""
    params = {
        "currentPage": 1,
        "maxResults": 10,
        "businessCode": COMPANY_CODE,
        "startDate": START_DATE,
        "endDate": END_DATE,
        "reportName": REPORT_TYPE
    }
    response = requests.get(SEARCH_URL, params=params)
    soup = BeautifulSoup(response.text, 'html.parser')
    return soup.select(".table_list tr")[1:]  # Extract rows excluding the header

def extract_excel_url(report_url):
    """Function to extract the Excel file URL from the report page"""
    response = requests.get(report_url)
    soup = BeautifulSoup(response.text, 'html.parser')
    excel_link = soup.select_one("a[href*='download.xbrl']")
    if excel_link:
        return urljoin(report_url, excel_link['href'])
    return None

def download_excel(url):
    """Function to download the Excel file and convert it into a DataFrame"""
    response = requests.get(url)
    with open("temp.xlsx", "wb") as f:
        f.write(response.content)
    return pd.read_excel("temp.xlsx", engine='openpyxl')

<br/>

# Main Execution
if __name__ == "__main__":
    reports = get_report_list()
    for idx, report in enumerate(reports[:3]):  # Process up to 3 reports
        # Extract report title and link
        title = report.select_one("td:nth-child(3) a").text.strip()
        report_url = urljoin(SEARCH_URL, report.select_one("td:nth-child(3) a")['href'])
        
        print(f"[{idx+1}] Extracting data from {title}...")
        
        # Extract Excel file URL and download
        excel_url = extract_excel_url(report_url)
        if excel_url:
            df = download_excel(excel_url)
            print(df.head())  # Check the data
        else:
            print("Excel file not found.")

3. Code Explanation

  • Setting Search Criteria:
    • COMPANY_CODE: Stock code (e.g., Samsung Electronics=005930)
    • REPORT_TYPE: A001 (Annual), A002 (Semi-Annual), A003 (Quarterly)
    • The date range is limited by START_DATE and END_DATE.
  • Crawling Report List:
    • Call the DART disclosure search API to retrieve the report list.
    • Extract the report title and link after parsing the HTML with BeautifulSoup.
  • Extract Excel File:
    • Find and download the Excel file link in XBRL format from each report page.
    • Read the Excel file with pandas and convert it to a DataFrame.
  • Precautions
    • Dynamic Content Handling: Some pages may be dynamically loaded with JavaScript. In this case, you may need to use Selenium.
    • Data Consistency: The Excel file structure may vary from company to company, so you need to add column mapping logic.
    • Legal Restrictions: You must comply with the DART Terms of Use when web crawling.

You can implement additional data preprocessing and quant analysis logic based on this code.

Stocks
Quant
Investment
Crawling
DART (Data Analysis, Retrieval and Transfer System)

0


Terms of ServicePrivacy PolicySupport
© 2025
I Wish I Had Known Earlier
All rights reserved.