엘라스틱 서치

Spring Boot Jpa 네이티브 쿼리로 엘라스틱서치 쿼리 구현하기

ksb-dev 2024. 5. 26. 13:42

0. 개요

Spring Boot와 Jpa를 활용한 네이티브 쿼리로 엘라스틱 서치 쿼리를 구현해 보겠습니다.

Spring Boot와 엘라스틱 서치를 연결하는 방법은 아래 글에 적혀 있습니다.

스프링부트로 엘라스틱서치 쿼리 날리기

그리고 이 글은 아래 글과 관련 있습니다.

뉴스 데이터를 위한 엘라스틱 서치 쿼리 모음

1. 연도별 키워드 Top100

1.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "대분류": "사회"
                    }
                },
                {
                    "range":{
                        "작성시간":{
                            "gte": "2022-01-01 00:00",
                            "lte": "2022-12-31 23:59"
                        }
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "keyword_terms": {
            "terms": {
                "field": "키워드",
                "size": 100
            }
        }
    }
}

1.2 네이티브 쿼리

public List <KeywordTerms> getTopKeywordsByYear(int year) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
        ).withMaxResults(0)
        .addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(100))
        .build();
    SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
    ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");

    return pst.getBuckets().stream().map(s - >
        new KeywordTerms(s.getKey().toString(), s.getDocCount())
    ).toList();
}

2. 연도, 대분류별 키워드 Top

2.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "match": {
                        "대분류": "사회"
                    }
                },
                {
                    "range":{
                        "작성시간":{
                            "gte": "2022-01-01 00:00",
                            "lte": "2022-12-31 23:59"
                        }
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "keyword_terms": {
            "terms": {
                "field": "키워드",
                "size": 100
            }
        }
    }
}

2.2 네이티브 쿼리

public List <KeywordTerms> getTopKeywordsByYearAndCategory(int year, CategoryType category) {
     NativeSearchQuery query = new NativeSearchQueryBuilder()
         .withQuery(QueryBuilders.boolQuery()
             .must(QueryBuilders.matchQuery("대분류", category.getName()))
             .must(QueryBuilders.rangeQuery("작성시간")
                 .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                 .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
             )
         ).withMaxResults(0)
         .addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(50))
         .build();
     SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
     ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");

     return pst.getBuckets().stream().map(s - >
         new KeywordTerms(s.getKey().toString(), s.getDocCount())
     ).toList();
 }

3. 연도, 대분류, 키워드로 랜덤 기사

3.1 엘라스틱서치 쿼리

{
    "size": 1,
    "query": {
        "function_score": {
            "query": {
                "bool": {
                    "must": [
                        {
                            "range": {
                                "작성시간": {
                                    "gte": "2023-01-01 00:00",
                                    "lt": "2024-01-01 00:00"
                                }
                            }
                        },
                        {
                            "match": {
                                "대분류": "스포츠"
                            }
                        },
                        {
                            "match": {
                                "키워드": "손흥민"
                            }
                        },
                        {
                            "match": {
                                "요약": "손흥민"
                            }
                        }
                    ]
                }
            },
            "functions": [
                {
                    "random_score": {}
                }
            ],
            "score_mode": "sum"
        }
    }
}

3.2 네이티브 쿼리

public ArticleEntity getRandomArticleByYearAndCategoryAndKeyword(int year, CategoryType category, String keyword) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.functionScoreQuery(
            QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
            .must(QueryBuilders.matchQuery("대분류", category.getName()))
            .must(QueryBuilders.matchQuery("키워드", keyword))
            .must(QueryBuilders.matchQuery("요약", keyword)),
            ScoreFunctionBuilders.randomFunction())).withMaxResults(1)
        .build();
    SearchHits < ArticleEntity > search = operations.search(query, ArticleEntity.class);

    if (search.getTotalHits() == 0) {
        return null;
    }
    return search.getSearchHit(0).getContent();
}

4. most_like_this를 활용한 연관성이 있는 연도별 기사

4.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "more_like_this": {
                        "like": [
                            {
                                "_index": "hani-news-topic-index",
                                "_id": "871b5fa9-daed-44d2-a89e-acb1442bcfc0"
                            }
                        ],
                        "max_query_terms": 12,
                        "min_term_freq": 1,
                        "min_doc_freq": 5,
                        "max_doc_freq": 2147483647,
                        "min_word_length": 0,
                        "max_word_length": 0,
                        "minimum_should_match": "50%",
                        "boost_terms": 0.0,
                        "include": true,
                        "fail_on_unsupported_field": true,
                        "boost": 1.0
                    }
                },
                {
                    "range":{
                        "작성시간":{
                            "gte": "2021-01-01 00:00",
                            "lt": "2025-01-01 00:00"
                        }
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "most_relate_per_year": {
            "date_histogram": {
                "field": "작성시간",
                "calendar_interval": "year",
                "format": "yyyy"
            },
            "aggs": {
                "related_top_per_year": {
                    "top_hits": {
                        "size": 5,
                        "sort": [
                            {
                                "_score": {
                                    "order": "desc"
                                }
                            }
                        ]
                    }
                }
            }
        }
    }
}

4.2 네이티브 쿼리

public List <ArticleEntity> getRelatedArticlesTop5(String id) {
    String index = "hani-news-topic-index";
    MoreLikeThisQueryBuilder.Item[] likeItems = {
        new MoreLikeThisQueryBuilder.Item(index, id)
    };
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(
            moreLikeThisQuery(likeItems)
            .minTermFreq(1)
            .maxQueryTerms(12)
        )
        .build();
    SearchHits <?> searchHits = operations.search(query, ArticleEntity.class);
    List <ArticleEntity> result = searchHits.getSearchHits().stream()
        .limit(5)
        .map(SearchHit::getContent)
        .filter(ArticleEntity.class::isInstance)
        .map(ArticleEntity.class::cast)
        .toList();
    return result;
}

5. 오답 키워드 5개 추출

5.1 엘라스틱서치 쿼리

{
    "query": {
        "bool": {
            "must": [
                {
                    "range": {
                        "작성시간": {
                            "gte": "2023-01-01 00:00",
                            "lt": "2024-01-01 00:00"
                        }
                    }
                },
                {
                    "match": {
                        "대분류": "경제"
                    }
                }
            ],
            "must_not": [
                {
                    "match": {
                        "키워드": "인공지능"
                    }
                }
            ]
        }
    },
    "size": 0,
    "aggs": {
        "keyword_terms": {
            "terms": {
                "field": "키워드",
                "size": 5
            }
        }
    }
}

5.2 네이티브 쿼리

public List <KeywordTerms> getMultipleChoice(int year, CategoryType category, String keyword, int count) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(
            QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
            .must(QueryBuilders.matchQuery("대분류", category.getName()))
            .mustNot(QueryBuilders.matchQuery("키워드", keyword))
        ).withMaxResults(0)
        .addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(count))
        .build();

    SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
    ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");

    return pst.getBuckets().stream().map(s - >
        new KeywordTerms(s.getKey().toString(), s.getDocCount())
    ).toList();
}

6. 연도, 대분류로 랜덤 기사

6.1 엘라스틱서치 쿼리

6.2 네이티브 쿼리

public ArticleEntity getRandomArticleByYearAndCategory(int year, CategoryType category) {
    NativeSearchQuery query = new NativeSearchQueryBuilder()
        .withQuery(QueryBuilders.functionScoreQuery(
            QueryBuilders.boolQuery()
            .must(QueryBuilders.rangeQuery("작성시간")
                .gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
                .lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
            )
            .must(QueryBuilders.matchQuery("대분류", category.getName())),
            ScoreFunctionBuilders.randomFunction())).withMaxResults(1)
        .build();
    SearchHit <ArticleEntity> search = operations.search(query, ArticleEntity.class).getSearchHit(0);
    ArticleEntity article = search.getContent();
    return article;
}