Spring Boot 메모리 걱정 없이 대용량 데이터 다루기 #2

안녕하세요, Devloo입니다. 🙂 지난번에 작성한 “Spring Boot 메모리 걱정 없이 대용량 데이터 다루기 #1 — JPA 사용“이라는 글에서 Spring Boot를 사용하여 메모리보다 많은 데이터를 반환하는 REST 엔드포인트를 만드는 방법을 설명했습니다. 이번 글에서는 JPA 대신 Spring Boot JDBC를 사용하여 동일한 목표를 어떻게 달성할 수 있는지 알아보겠습니다.

Spring Boot JDBC: 메모리 걱정 없이 대용량 데이터 다루기 — 사진: Unsplash의Raul Hender

>> 목차

시나리오

먼저, 백만 개의 주문(Orders)과 5백만 개 이상의 주문 항목(OrderItems)을 가진 동일한 데이터베이스를 재사용하겠습니다.

데이터를 저장하는 방법을 단순화하기 위해 간단한 DTO를 사용해 보겠습니다.

public record ReportDto(
    Long orderId,
    LocalDate date,
    String customerName,
    BigDecimal totalAmount,
    String currency,
    String status,
    String paymentMethod,
    Long productId,
    String productName,
    Integer quantity,
    BigDecimal price,
    BigDecimal itemTotalAmount
) {}

이제 데이터를 가져오기 위해 다음 SQL 쿼리를 사용할 것이며, JdbcTemplate 클래스를 사용해 쿼리를 실행하겠습니다:

SELECT o.id, o.date, c.first_name, o.total_amount, o.currency, o.status, 
       o.payment_method, oi.product_id, p.name, oi.quantity, oi.price, 
       oi.total_amount
  FROM orders o
  JOIN customers c ON c.id = o.customer_id
  JOIN order_items oi ON oi.order_id = o.id
  JOIN products p ON p.id = oi.product_id

시도 1: `queryForList` 메서드 사용하기

queryForList 메서드를 사용하면 쿼리 결과를 필드-값 쌍을 포함하는 맵의 리스트로 반환하지만, 이는 메모리 문제를 일으킬 수 있습니다.

@RestController
@RequiredArgsConstructor
public class ReportController {
  private final ReportService reportService;

  @GetMapping("/v4/report")
  public ResponseEntity<List<ReportDto>> report4() {
    final var body = reportService.getResult();
    return ResponseEntity.ok(body);
  }
}

@Service
@RequiredArgsConstructor
public class ReportService {
  private final JdbcTemplate jdbcTemplate;
  private static final String QUERY = "SELECT o.id, o.date, c.first_name, o.total_amount,       
                                       o.currency, o.status, " +
                                      "o.payment_method, oi.product_id, p.name, oi.quantity, 
                                       oi.price, " +
                                      "oi.total_amount " +
                                      "FROM orders o " +
                                      "JOIN customers c ON c.id = o.customer_id " +
                                      "JOIN order_items oi ON oi.order_id = o.id " +
                                      "JOIN products p ON p.id = oi.product_id";

  public List<ReportDto> getResult() {
    List<Map<String, Object>> queryResult = jdbcTemplate.queryForList(QUERY);
    return toReportDto(queryResult);
  }

  private List<ReportDto> toReportDto(List<Map<String, Object>> queryResult) {
    return queryResult.stream()
                      .map(row -> new ReportDto(
                          (Long) row.get("id"),
                          (LocalDate) row.get("date"),
                          (String) row.get("first_name"),
                          (BigDecimal) row.get("total_amount"),
                          (String) row.get("currency"),
                          (String) row.get("status"),
                          (String) row.get("payment_method"),
                          (Long) row.get("product_id"),
                          (String) row.get("name"),
                          (Integer) row.get("quantity"),
                          (BigDecimal) row.get("price"),
                          (BigDecimal) row.get("total_amount")))
                      .toList();
  }
}

public record ReportDto(
//...생략...//
) {}

예상한 대로, 결과 전체를 리스트에 저장하면 메모리 부족 오류가 발생합니다:

Servlet.service() for servlet [dispatcherServlet] in context with path [] 
  threw exception [Handler dispatch failed: java.lang.OutOfMemoryError: 
    Java heap space] with root cause

java.lang.OutOfMemoryError: Java heap space

시도 2: `RowMapper`와 함께 `query` 메서드 사용하기

일반적인 결과를 반환한 후 DTO로 매핑하는 대신, 쿼리 처리 중에 RowMapper를 사용하여 매핑합니다.

@RestController
@RequiredArgsConstructor
public class ReportController {
  private final ReportService reportService;

  @GetMapping("/v5/report")
  public ResponseEntity<List<ReportDto>> report5() {
    final var body = reportService.getResult();
    return ResponseEntity.ok(body);
  }
}

@Service
@RequiredArgsConstructor
public class ReportService {
  private final JdbcTemplate jdbcTemplate;
  private static final String QUERY = """
    SELECT o.id, o.date, c.first_name, o.total_amount, o.currency, o.status, 
           o.payment_method, oi.product_id, p.name, oi.quantity, oi.price, 
           oi.total_amount 
      FROM orders o 
      JOIN customers c ON c.id = o.customer_id 
      JOIN order_items oi ON oi.order_id = o.id 
      JOIN products p ON p.id = oi.product_id
    """;

  public List<ReportDto> getResult() {
    return jdbcTemplate.query(QUERY, rowMapper);
  }

  private final RowMapper<ReportDto> rowMapper = (rs, rowNum) -> new ReportDto(
      rs.getLong("id"),
      rs.getDate("date").toLocalDate(),
      rs.getString("first_name"),
      rs.getBigDecimal("total_amount"),
      rs.getString("currency"),
      rs.getString("status"),
      rs.getString("payment_method"),
      rs.getLong("product_id"),
      rs.getString("name"),
      rs.getInt("quantity"),
      rs.getBigDecimal("price"),
      rs.getBigDecimal("total_amount")
  );
}

public record ReportDto(
//...생략...//
) {}

이 접근 방식은 작동하지만, 쿼리가 DTO로 매핑되기 전에 모든 데이터를 읽기 때문에 메모리 부족 오류가 발생합니다.

시도 3: `queryForStream` 메서드 사용하기

데이터베이스에서 스트림을 반환할 수 있는 queryForStream 메서드를 사용해 봅니다. 그러나 이 구현도 이전 시도들과 유사하게 실패했습니다.

@RestController
@RequiredArgsConstructor
public class ReportController {
  private final ReportService reportService;

  @GetMapping("/v6/report")
  public ResponseEntity<List<ReportDto>> report6() {
    final var body = reportService.getResult();
    return ResponseEntity.ok(body);
  }
}

@Service
@RequiredArgsConstructor
public class ReportService {
  private final JdbcTemplate jdbcTemplate;
  private static final String QUERY = """
    SELECT o.id, o.date, c.first_name, o.total_amount, o.currency, o.status, 
           o.payment_method, oi.product_id, p.name, oi.quantity, oi.price, 
           oi.total_amount 
      FROM orders o 
      JOIN customers c ON c.id = o.customer_id 
      JOIN order_items oi ON oi.order_id = o.id 
      JOIN products p ON p.id = oi.product_id
    """;

  public List<ReportDto> getResult() {
    try (var stream = jdbcTemplate.queryForStream(QUERY, rowMapper)) {
      return stream.collect(Collectors.toList());
    }
  }

  private final RowMapper<ReportDto> rowMapper = (rs, rowNum) -> new ReportDto(
      rs.getLong("id"),
      rs.getDate("date").toLocalDate(),
      rs.getString("first_name"),
      rs.getBigDecimal("total_amount"),
      rs.getString("currency"),
      rs.getString("status"),
      rs.getString("payment_method"),
      rs.getLong("product_id"),
      rs.getString("name"),
      rs.getInt("quantity"),
      rs.getBigDecimal("price"),
      rs.getBigDecimal("total_amount")
  );
}

public record ReportDto(
//...생략...//
) {}

시도 4: `RowCallbackHandler`와 함께 `query` 메서드 사용하기

JdbcTemplate에서 RowCallbackHandler를 사용하는 query 메서드를 사용해 하나씩 행을 처리하고 DTO로 매핑하여 메모리 낭비를 방지합니다. 또한, fetch size를 1000으로 설정하여 한 번에 1000개의 결과를 가져오도록 합니다.

@RestController
@RequiredArgsConstructor
public class ReportController {
  private final ReportService reportService;

  @GetMapping("/v7/report")
  public ResponseEntity<List<ReportDto>> report7() {
    final var body = reportService.getResult();
    return ResponseEntity.ok(body);
  }
}

@Service
@RequiredArgsConstructor
public class ReportService {
  private final JdbcTemplate jdbcTemplate;
  private static final String QUERY = """
    SELECT o.id, o.date, c.first_name, o.total_amount, o.currency, o.status, 
           o.payment_method, oi.product_id, p.name, oi.quantity, oi.price, 
           oi.total_amount 
      FROM orders o 
      JOIN customers c ON c.id = o.customer_id 
      JOIN order_items oi ON oi.order_id = o.id 
      JOIN products p ON p.id = oi.product_id
    """;

  public List<ReportDto> getResult() {
    jdbcTemplate.setFetchSize(1000);
    final List<ReportDto> result = new ArrayList<>();

    RowCallbackHandler rowCallbackHandler = rs -> {
      while (rs.next()) {
        result.add(mapToReportDto(rs));
      }
    };

    jdbcTemplate.query(QUERY, rowCallbackHandler);
    return result;
  }

  private ReportDto mapToReportDto(ResultSet rs) throws SQLException {
    return new ReportDto(
        rs.getLong("id"),
        rs.getDate("date").toLocalDate(),
        rs.getString("first_name"),
        rs.getBigDecimal("total_amount"),
        rs.getString("currency"),
        rs.getString("status"),
        rs.getString("payment_method"),
        rs.getLong("product_id"),
        rs.getString("name"),
        rs.getInt("quantity"),
        rs.getBigDecimal("price"),
        rs.getBigDecimal("total_amount")
    );
  }
}

public record ReportDto(
//...생략...//
) {}

그러나 이 방법도 충분하지 않았습니다.

마지막: RowCallbackHandler와 쿼리 메서드를 사용하여 데이터 스트리밍하기

이 문제를 해결하려면 데이터베이스에서 데이터를 쿼리하는 메서드를 변경하는 것 이상이 필요합니다. 우리는 데이터를 스트리밍해야 합니다. 변경 사항은 다음과 같습니다:

쿼리 결과를 한 행씩 처리할 수 있는 JdbcTemplate query 메서드를 사용합니다.
REST 엔드포인트에서 데이터를 청크 단위로 클라이언트에 전송할 수 있도록 StreamingResponseBody 클래스를 사용합니다.

@RestController
@RequiredArgsConstructor
public class ReportController {

  private final ReportService reportService;

  @GetMapping("/v7/report")
  public ResponseEntity<StreamingResponseBody> report7() {
    final var body = reportService.getResult();
    return ResponseEntity.ok(body);
  }
}

@Service
@RequiredArgsConstructor
public class ReportService {
  private final JdbcTemplate jdbcTemplate;
  private final ObjectMapper objectMapper = new ObjectMapper();
  private static final String QUERY = """
    SELECT o.id, o.date, c.first_name, o.total_amount, o.currency, o.status, 
           o.payment_method, oi.product_id, p.name, oi.quantity, oi.price, 
           oi.total_amount 
      FROM orders o 
      JOIN customers c ON c.id = o.customer_id 
      JOIN order_items oi ON oi.order_id = o.id 
      JOIN products p ON p.id = oi.product_id
    """;

  public StreamingResponseBody getResult() {
    jdbcTemplate.setFetchSize(1000);

    return outputStream -> {
      jdbcTemplate.query(
          QUERY,
          rs -> {
            while (rs.next()) {
              try {
                var json = objectMapper.writeValueAsString(mapToReportDto(rs));
                outputStream.write(json.getBytes(StandardCharsets.UTF_8));
              } catch (final IOException e) {
                throw new RuntimeException(e);
              }
            }
          });
    };
  }

  private ReportDto mapToReportDto(ResultSet rs) throws SQLException {
    return new ReportDto(
        rs.getLong("id"),
        rs.getDate("date").toLocalDate(),
        rs.getString("first_name"),
        rs.getBigDecimal("total_amount"),
        rs.getString("currency"),
        rs.getString("status"),
        rs.getString("payment_method"),
        rs.getLong("product_id"),
        rs.getString("name"),
        rs.getInt("quantity"),
        rs.getBigDecimal("price"),
        rs.getBigDecimal("total_amount")
    );
  }
}

public record ReportDto(
//...생략...//
) {}

진행한 작업:

컨트롤러는 결과 데이터를 스트리밍합니다.
서비스는 fetch size를 1000으로 설정하여 쿼리가 한 번에 1000개의 행을 데이터베이스에서 반환하도록 합니다. 그렇지 않으면 JdbcTemplate 클래스는 모든 데이터를 한 번에 가져오려고 하여 메모리 부족 오류를 일으킬 수 있습니다.
각 결과 집합을 DTO로 매핑하고, 이를 JSON으로 변환한 다음 출력 스트림에 씁니다.

이제 목표를 달성했습니다. 이 엔드포인트는 5백만 개의 레코드를 30초 만에 반환할 수 있었습니다!

결론

JdbcTemplate을 사용하면 JPA보다 몇 가지 작업을 수동으로 처리해야 하지만, 복잡한 SQL 쿼리를 보다 쉽게 작성하고 실행할 수 있다는 장점이 있습니다. 예를 들어, DTO 매핑 작업은 직접 처리해야 하지만, 이는 복잡한 데이터를 효과적으로 가져오는 데 큰 도움이 됩니다.

물론, 대부분의 경우에는 JPA를 사용하는 것이 더 쉽고 실수를 줄이는 데 유리합니다. JPA는 구현이 간단하고 자동으로 많은 작업을 처리해 주기 때문에, 빠른 개발과 유지보수에 유리합니다. 하지만 때로는 JdbcTemplate이 필요할 때가 있습니다. 특히, 복잡한 쿼리를 빠르게 실행해야 할 때는 JdbcTemplate이 더 적합할 수 있습니다.

중요한 점은 어떤 프레임워크를 사용하든 간에, 메모리 문제를 해결할 수 있는 다양한 도구와 방법이 있다는 것입니다. 적절한 도구를 선택하고 활용하면, 대용량 데이터를 효율적으로 처리할 수 있습니다. 따라서, 프로젝트의 요구 사항에 따라 적합한 기술을 선택하는 것이 중요합니다.

지금까지, 메모리 부하 없이 대용량 데이터를 처리하는 방법에 대해 알아보았습니다.
궁금하신 점은 댓글에 남겨주세요 !! ㅎㅎ 끝까지 읽어주셔서 감사합니다.

Spring Boot 메모리 걱정 없이 대용량 데이터 다루기 #2 — JDBC 사용

시나리오

시도 1: `queryForList` 메서드 사용하기

시도 2: `RowMapper`와 함께 `query` 메서드 사용하기

시도 3: `queryForStream` 메서드 사용하기

시도 4: `RowCallbackHandler`와 함께 `query` 메서드 사용하기

마지막: RowCallbackHandler와 쿼리 메서드를 사용하여 데이터 스트리밍하기

진행한 작업:

결론

Written by 개발자서동우

Spring Boot로 결제되지 않은 주문을 30분 후 자동으로 취소하는 방법

Spring Boot 어노테이션 완전 정복: 알기 쉬운 상세 설명과 예제

Spring Boot 메모리 걱정 없이 대용량 데이터 다루기 #1 — JPA 사용

Leave a Reply Cancel reply

Spring Boot 메모리 걱정 없이 대용량 데이터 다루기 #2 — JDBC 사용

시나리오

시도 1: queryForList 메서드 사용하기

시도 2: RowMapper와 함께 query 메서드 사용하기

시도 3: queryForStream 메서드 사용하기

시도 4: RowCallbackHandler와 함께 query 메서드 사용하기

마지막: RowCallbackHandler와 쿼리 메서드를 사용하여 데이터 스트리밍하기

진행한 작업:

결론

Written by 개발자서동우

Spring Boot로 결제되지 않은 주문을 30분 후 자동으로 취소하는 방법

Spring Boot 어노테이션 완전 정복: 알기 쉬운 상세 설명과 예제

Spring Boot 메모리 걱정 없이 대용량 데이터 다루기 #1 — JPA 사용

Leave a Reply Cancel reply

시도 1: `queryForList` 메서드 사용하기

시도 2: `RowMapper`와 함께 `query` 메서드 사용하기

시도 3: `queryForStream` 메서드 사용하기

시도 4: `RowCallbackHandler`와 함께 `query` 메서드 사용하기