0. 개요
Spring Boot와 Jpa를 활용한 네이티브 쿼리로 엘라스틱 서치 쿼리를 구현해 보겠습니다.
Spring Boot와 엘라스틱 서치를 연결하는 방법은 아래 글에 적혀 있습니다.
그리고 이 글은 아래 글과 관련 있습니다.
1. 연도별 키워드 Top100
1.1 엘라스틱서치 쿼리
{
"query": {
"bool": {
"must": [
{
"match": {
"대분류": "사회"
}
},
{
"range":{
"작성시간":{
"gte": "2022-01-01 00:00",
"lte": "2022-12-31 23:59"
}
}
}
]
}
},
"size": 0,
"aggs": {
"keyword_terms": {
"terms": {
"field": "키워드",
"size": 100
}
}
}
}
1.2 네이티브 쿼리
public List <KeywordTerms> getTopKeywordsByYear(int year) {
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.boolQuery()
.must(QueryBuilders.rangeQuery("작성시간")
.gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
.lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
)
).withMaxResults(0)
.addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(100))
.build();
SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");
return pst.getBuckets().stream().map(s - >
new KeywordTerms(s.getKey().toString(), s.getDocCount())
).toList();
}
2. 연도, 대분류별 키워드 Top
2.1 엘라스틱서치 쿼리
{
"query": {
"bool": {
"must": [
{
"match": {
"대분류": "사회"
}
},
{
"range":{
"작성시간":{
"gte": "2022-01-01 00:00",
"lte": "2022-12-31 23:59"
}
}
}
]
}
},
"size": 0,
"aggs": {
"keyword_terms": {
"terms": {
"field": "키워드",
"size": 100
}
}
}
}
2.2 네이티브 쿼리
public List <KeywordTerms> getTopKeywordsByYearAndCategory(int year, CategoryType category) {
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.boolQuery()
.must(QueryBuilders.matchQuery("대분류", category.getName()))
.must(QueryBuilders.rangeQuery("작성시간")
.gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
.lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
)
).withMaxResults(0)
.addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(50))
.build();
SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");
return pst.getBuckets().stream().map(s - >
new KeywordTerms(s.getKey().toString(), s.getDocCount())
).toList();
}
3. 연도, 대분류, 키워드로 랜덤 기사
3.1 엘라스틱서치 쿼리
{
"size": 1,
"query": {
"function_score": {
"query": {
"bool": {
"must": [
{
"range": {
"작성시간": {
"gte": "2023-01-01 00:00",
"lt": "2024-01-01 00:00"
}
}
},
{
"match": {
"대분류": "스포츠"
}
},
{
"match": {
"키워드": "손흥민"
}
},
{
"match": {
"요약": "손흥민"
}
}
]
}
},
"functions": [
{
"random_score": {}
}
],
"score_mode": "sum"
}
}
}
3.2 네이티브 쿼리
public ArticleEntity getRandomArticleByYearAndCategoryAndKeyword(int year, CategoryType category, String keyword) {
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.functionScoreQuery(
QueryBuilders.boolQuery()
.must(QueryBuilders.rangeQuery("작성시간")
.gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
.lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
)
.must(QueryBuilders.matchQuery("대분류", category.getName()))
.must(QueryBuilders.matchQuery("키워드", keyword))
.must(QueryBuilders.matchQuery("요약", keyword)),
ScoreFunctionBuilders.randomFunction())).withMaxResults(1)
.build();
SearchHits < ArticleEntity > search = operations.search(query, ArticleEntity.class);
if (search.getTotalHits() == 0) {
return null;
}
return search.getSearchHit(0).getContent();
}
4. most_like_this를 활용한 연관성이 있는 연도별 기사
4.1 엘라스틱서치 쿼리
{
"query": {
"bool": {
"must": [
{
"more_like_this": {
"like": [
{
"_index": "hani-news-topic-index",
"_id": "871b5fa9-daed-44d2-a89e-acb1442bcfc0"
}
],
"max_query_terms": 12,
"min_term_freq": 1,
"min_doc_freq": 5,
"max_doc_freq": 2147483647,
"min_word_length": 0,
"max_word_length": 0,
"minimum_should_match": "50%",
"boost_terms": 0.0,
"include": true,
"fail_on_unsupported_field": true,
"boost": 1.0
}
},
{
"range":{
"작성시간":{
"gte": "2021-01-01 00:00",
"lt": "2025-01-01 00:00"
}
}
}
]
}
},
"size": 0,
"aggs": {
"most_relate_per_year": {
"date_histogram": {
"field": "작성시간",
"calendar_interval": "year",
"format": "yyyy"
},
"aggs": {
"related_top_per_year": {
"top_hits": {
"size": 5,
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}
}
}
}
}
}
4.2 네이티브 쿼리
public List <ArticleEntity> getRelatedArticlesTop5(String id) {
String index = "hani-news-topic-index";
MoreLikeThisQueryBuilder.Item[] likeItems = {
new MoreLikeThisQueryBuilder.Item(index, id)
};
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(
moreLikeThisQuery(likeItems)
.minTermFreq(1)
.maxQueryTerms(12)
)
.build();
SearchHits <?> searchHits = operations.search(query, ArticleEntity.class);
List <ArticleEntity> result = searchHits.getSearchHits().stream()
.limit(5)
.map(SearchHit::getContent)
.filter(ArticleEntity.class::isInstance)
.map(ArticleEntity.class::cast)
.toList();
return result;
}
5. 오답 키워드 5개 추출
5.1 엘라스틱서치 쿼리
{
"query": {
"bool": {
"must": [
{
"range": {
"작성시간": {
"gte": "2023-01-01 00:00",
"lt": "2024-01-01 00:00"
}
}
},
{
"match": {
"대분류": "경제"
}
}
],
"must_not": [
{
"match": {
"키워드": "인공지능"
}
}
]
}
},
"size": 0,
"aggs": {
"keyword_terms": {
"terms": {
"field": "키워드",
"size": 5
}
}
}
}
5.2 네이티브 쿼리
public List <KeywordTerms> getMultipleChoice(int year, CategoryType category, String keyword, int count) {
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(
QueryBuilders.boolQuery()
.must(QueryBuilders.rangeQuery("작성시간")
.gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
.lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
)
.must(QueryBuilders.matchQuery("대분류", category.getName()))
.mustNot(QueryBuilders.matchQuery("키워드", keyword))
).withMaxResults(0)
.addAggregation(AggregationBuilders.terms("keyword_terms").field("키워드").size(count))
.build();
SearchHits < ? > searchHits = operations.search(query, ArticleEntity.class);
ParsedStringTerms pst = Objects.requireNonNull(searchHits.getAggregations()).get("keyword_terms");
return pst.getBuckets().stream().map(s - >
new KeywordTerms(s.getKey().toString(), s.getDocCount())
).toList();
}
6. 연도, 대분류로 랜덤 기사
6.1 엘라스틱서치 쿼리
6.2 네이티브 쿼리
public ArticleEntity getRandomArticleByYearAndCategory(int year, CategoryType category) {
NativeSearchQuery query = new NativeSearchQueryBuilder()
.withQuery(QueryBuilders.functionScoreQuery(
QueryBuilders.boolQuery()
.must(QueryBuilders.rangeQuery("작성시간")
.gte(LocalDateTime.of(year, 1, 1, 0, 0).format(formatter))
.lte(LocalDateTime.of(year + 1, 1, 1, 0, 0).format(formatter))
)
.must(QueryBuilders.matchQuery("대분류", category.getName())),
ScoreFunctionBuilders.randomFunction())).withMaxResults(1)
.build();
SearchHit <ArticleEntity> search = operations.search(query, ArticleEntity.class).getSearchHit(0);
ArticleEntity article = search.getContent();
return article;
}
'엘라스틱 서치' 카테고리의 다른 글
뉴스 데이터를 위한 엘라스틱 서치 쿼리 모음 (0) | 2024.05.26 |
---|---|
스프링부트로 엘라스틱서치 쿼리 날리기 (0) | 2023.06.09 |
ELK, Mysql, Kafka 구축 및 연동 (0) | 2023.06.02 |
스프링부트 테스트와 엘라스틱서치 테스트 컨테이너 (0) | 2023.05.27 |
엘라스틱서치 검색 Query DSL (0) | 2023.05.18 |