Dart electronic disclosure system crawling method for stock quant investment

Dart electronic disclosure system crawling method for stock quant investment

KissCuseMe
2025-03-11
18

I will explain how to crawl financial statements in the Dart Electronics Disclosure System. Use Python's Requests, BeautifulSoup, and Pandas Library.However, be careful of changing the website structure when actually used, and excessive requests can be loaded on the server.


1. Installation of the necessary library

pip install requests beautifulsoup4 pandas openpyxl

2. Dart disclosure search and financial statement crawl example

import requests
from bs4 import BeautifulSoup
import pandas as pd
from urllib.parse import urljoin

# Search Criteria (e.g., Samsung Electronics (005930) Annual Report)
COMPANY_CODE = "005930"  # Stock Code
START_DATE = "20230101"  # Search Start Date (YYYYMMDD)
END_DATE = "20231231"    # Search End Date (YYYYMMDD)
REPORT_TYPE = "A001"     # A001: Annual Report, A002: Semi-Annual Report, A003: Quarterly Report

# DART Disclosure Search URL
SEARCH_URL = "http://dart.fss.or.kr/dsab001/search.ax"

def get_report_list():
    """Function to fetch the list of DART disclosure reports"""
    params = {
        "currentPage": 1,
        "maxResults": 10,
        "businessCode": COMPANY_CODE,
        "startDate": START_DATE,
        "endDate": END_DATE,
        "reportName": REPORT_TYPE
    }
    response = requests.get(SEARCH_URL, params=params)
    soup = BeautifulSoup(response.text, 'html.parser')
    return soup.select(".table_list tr")[1:]  # Extract rows excluding the header

def extract_excel_url(report_url):
    """Function to extract the Excel file URL from the report page"""
    response = requests.get(report_url)
    soup = BeautifulSoup(response.text, 'html.parser')
    excel_link = soup.select_one("a[href*='download.xbrl']")
    if excel_link:
        return urljoin(report_url, excel_link['href'])
    return None

def download_excel(url):
    """Function to download the Excel file and convert it into a DataFrame"""
    response = requests.get(url)
    with open("temp.xlsx", "wb") as f:
        f.write(response.content)
    return pd.read_excel("temp.xlsx", engine='openpyxl')

# Main Execution
if __name__ == "__main__":
    reports = get_report_list()
    for idx, report in enumerate(reports[:3]):  # Process up to 3 reports
        # Extract report title and link
        title = report.select_one("td:nth-child(3) a").text.strip()
        report_url = urljoin(SEARCH_URL, report.select_one("td:nth-child(3) a")['href'])
        
        print(f"[{idx+1}] Extracting data from {title}...")
        
        # Extract Excel file URL and download
        excel_url = extract_excel_url(report_url)
        if excel_url:
            df = download_excel(excel_url)
            print(df.head())  # Check the data
        else:
            print("Excel file not found.")

3. Code description

  • Search condition setting:
    • Company_code: S item code (e.g. Samsung Electronics = 005930)
    • Report_type: A001 (annual), A002 (semi -annual), A003 (branch)
    • Date range is limited to start_date and end_date.
  • Report list crawling:
    • Call Dart Disclosure Search API and get a list of reports.
    • After parsing HTML with BeautifulSoup, extract the report title and link.
  • Excel file extraction:
    • On each report page, find and download the XBRL file link in the XBRL format.
    • Read the Excel file to Pandas and convert it to DataFrame.
  • caution
    • Dynamic content processing: Some pages can be loaded dynamically with JavaScript. You may need to use this -selenium.
    • Data -consolidation: EXCEL file structure may be different for each company, so you need to add column mapping logic.
    • Legal Restriction: You must comply with the terms and conditions of Dart when crawling.

Based on this code, you can implement additional data pretreatment and quant analysis logic.

Stock
quant
investment
crawling
dart

0

Table of Contents

  • 1. Installation of the necessary library
  • 2. Dart disclosure search and financial statement crawl example
  • 3. Code description
This post is part of the Coupang Partners Program and may contain affiliate links, for which I may receive a commission.

Terms of ServicePrivacy PolicySupport
© 2025
I wish I had known in advance
All rights reserved.