본문 바로가기

엘라스틱 서치

Spring Boot Jpa 네이티브 쿼리로 엘라스틱서치 쿼리 구현하기

0. 개요

Spring Boot와 Jpa를 활용한 네이티브 쿼리로 엘라스틱 서치 쿼리를 구현해 보겠습니다.

Spring Boot와 엘라스틱 서치를 연결하는 방법은 아래 글에 적혀 있습니다.

스프링부트로 엘라스틱서치 쿼리 날리기

그리고 이 글은 아래 글과 관련 있습니다.

뉴스 데이터를 위한 엘라스틱 서치 쿼리 모음

1. 연도별 키워드 Top100

1.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "대분류": "사회"
                    }
                },
                {
                    "range":{
                        "작성시간":{
                            "gte": "2022-01-01 00:00",
                            "lte": "2022-12-31 23:59"
                        }
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "keyword_terms": {
            "terms": {
                "field": "키워드",
                "size": 100
            }
        }
    }
}

1.2 네이티브 쿼리

public List <KeywordTerms> getTopKeywordsByYear(int year) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
        ).withMaxResults(0)
        .addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(100))
        .build();
    SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
    ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");

    return pst.getBuckets().stream().map(s - >
        new KeywordTerms(s.getKey().toString(), s.getDocCount())
    ).toList();
}

2. 연도, 대분류별 키워드 Top

2.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "대분류": "사회"
                    }
                },
                {
                    "range":{
                        "작성시간":{
                            "gte": "2022-01-01 00:00",
                            "lte": "2022-12-31 23:59"
                        }
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "keyword_terms": {
            "terms": {
                "field": "키워드",
                "size": 100
            }
        }
    }
}

2.2 네이티브 쿼리

public List <KeywordTerms> getTopKeywordsByYearAndCategory(int year, CategoryType category) {
     NativeSearchQuery query = new NativeSearchQueryBuilder()
         .withQuery(QueryBuilders.boolQuery()
             .must(QueryBuilders.matchQuery("대분류", category.getName()))
             .must(QueryBuilders.rangeQuery("작성시간")
                 .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                 .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
             )
         ).withMaxResults(0)
         .addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(50))
         .build();
     SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
     ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");

     return pst.getBuckets().stream().map(s - >
         new KeywordTerms(s.getKey().toString(), s.getDocCount())
     ).toList();
 }

3. 연도, 대분류, 키워드로 랜덤 기사

3.1 엘라스틱서치 쿼리

{
    "size": 1,
    "query": {
        "function_score": {
            "query": {
                "bool": {
                    "must": [
                        {
                            "range": {
                                "작성시간": {
                                    "gte": "2023-01-01 00:00",
                                    "lt": "2024-01-01 00:00"
                                }
                            }
                        },
                        {
                            "match": {
                                "대분류": "스포츠"
                            }
                        },
                        {
                            "match": {
                                "키워드": "손흥민"
                            }
                        },
                        {
                            "match": {
                                "요약": "손흥민"
                            }
                        }
                    ]
                }
            },
            "functions": [
                {
                    "random_score": {}
                }
            ],
            "score_mode": "sum"
        }
    }
}

3.2 네이티브 쿼리

public ArticleEntity getRandomArticleByYearAndCategoryAndKeyword(int year, CategoryType category, String keyword) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.functionScoreQuery(
            QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
            .must(QueryBuilders.matchQuery("대분류", category.getName()))
            .must(QueryBuilders.matchQuery("키워드", keyword))
            .must(QueryBuilders.matchQuery("요약", keyword)),
            ScoreFunctionBuilders.randomFunction())).withMaxResults(1)
        .build();
    SearchHits < ArticleEntity > search = operations.search(query, ArticleEntity.class);

    if (search.getTotalHits() == 0) {
        return null;
    }
    return search.getSearchHit(0).getContent();
}

4. most_like_this를 활용한 연관성이 있는 연도별 기사

4.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "more_like_this": {
                        "like": [
                            {
                                "_index": "hani-news-topic-index",
                                "_id": "871b5fa9-daed-44d2-a89e-acb1442bcfc0"
                            }
                        ],
                        "max_query_terms": 12,
                        "min_term_freq": 1,
                        "min_doc_freq": 5,
                        "max_doc_freq": 2147483647,
                        "min_word_length": 0,
                        "max_word_length": 0,
                        "minimum_should_match": "50%",
                        "boost_terms": 0.0,
                        "include": true,
                        "fail_on_unsupported_field": true,
                        "boost": 1.0
                    }
                },
                {
                    "range":{
                        "작성시간":{
                            "gte": "2021-01-01 00:00",
                            "lt": "2025-01-01 00:00"
                        }
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "most_relate_per_year": {
            "date_histogram": {
                "field": "작성시간",
                "calendar_interval": "year",
                "format": "yyyy"
            },
            "aggs": {
                "related_top_per_year": {
                    "top_hits": {
                        "size": 5,
                        "sort": [
                            {
                                "_score": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

4.2 네이티브 쿼리

public List <ArticleEntity> getRelatedArticlesTop5(String id) {
    String index = "hani-news-topic-index";
    MoreLikeThisQueryBuilder.Item[] likeItems = {
        new MoreLikeThisQueryBuilder.Item(index, id)
    };
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(
            moreLikeThisQuery(likeItems)
            .minTermFreq(1)
            .maxQueryTerms(12)
        )
        .build();
    SearchHits <?> searchHits = operations.search(query, ArticleEntity.class);
    List <ArticleEntity> result = searchHits.getSearchHits().stream()
        .limit(5)
        .map(SearchHit::getContent)
        .filter(ArticleEntity.class::isInstance)
        .map(ArticleEntity.class::cast)
        .toList();
    return result;
}

5. 오답 키워드 5개 추출

5.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "작성시간": {
                            "gte": "2023-01-01 00:00",
                            "lt": "2024-01-01 00:00"
                        }
                    }
                },
                {
                    "match": {
                        "대분류": "경제"
                    }
                }
            ],
            "must_not": [
                {
                    "match": {
                        "키워드": "인공지능"
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "keyword_terms": {
            "terms": {
                "field": "키워드",
                "size": 5
            }
        }
    }
}

5.2 네이티브 쿼리

public List <KeywordTerms> getMultipleChoice(int year, CategoryType category, String keyword, int count) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(
            QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
            .must(QueryBuilders.matchQuery("대분류", category.getName()))
            .mustNot(QueryBuilders.matchQuery("키워드", keyword))
        ).withMaxResults(0)
        .addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(count))
        .build();

    SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
    ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");

    return pst.getBuckets().stream().map(s - >
        new KeywordTerms(s.getKey().toString(), s.getDocCount())
    ).toList();
}

6. 연도, 대분류로 랜덤 기사

6.1 엘라스틱서치 쿼리

6.2 네이티브 쿼리

public ArticleEntity getRandomArticleByYearAndCategory(int year, CategoryType category) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.functionScoreQuery(
            QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
            .must(QueryBuilders.matchQuery("대분류", category.getName())),
            ScoreFunctionBuilders.randomFunction())).withMaxResults(1)
        .build();
    SearchHit <ArticleEntity> search = operations.search(query, ArticleEntity.class).getSearchHit(0);
    ArticleEntity article = search.getContent();
    return article;
}