맵 클러스터 데이터 다루기.

들어가며.

지도 서비스를 개발 한다면, 필연적으로 직면해야 하는게 핀 데이터를 다루는 일이다. 결론적으로 여러개의 핀이 뭉쳐진 클러스터링까지 구현을 해야 한다.

다음과 같은 녀석들이 꽤나 삽질을 유도한다.

어느 거리 수준으로 클러스터링을 묶을 것인가? (distance)

지도의 클라이언트 SDK에서 제공되는 Zoom Level에 따른 뭉쳐진 클러스터링의 개수

아래는 GPT를 참고하여 쓴 글임을 미리 알리고 시작하겠습니다.

모델 설정.

Prisma 스키마 형태로 표현


model Location {
	lat Int
	lng Int
	geohash String
	title String ...
}

위와 같이 필수적인 조건은 lat, lng, geohash 이다. 추후 ORM 쿼링을 위해서 geohash를, 클러스터링을 위해서 lat, lng 을 활용했다.

geohash를 사용한 위치 데이터 쿼링.

백엔드 사이드에서는 지도 데이터를 쿼링을 해주어야 한다. 그리고 클러스터링을 통해서 http response로 클라이언트 사이드에 전달해주기만 하면 된다!

그렇다면 어떻게 geohash를 가지고 쿼링을 하기 위해 통신하는지 알아보자.

클라이언트 사이드.

Mapbox SDK, Kakao Map SDK 혹은 Naver Map SDK 등의 Map SDK에서는 유저에게 보여지는 부분의 바운더리 좌표를 구할 수 있다. (bounds) 그것을 코드상으로는 SouthWestLng, SouthWestLat, NorthEastLng, NorthEastLat 으로 표현하는 것 같다. 만약 해당 바운더리를 구했다면 다음 라이브러리를 인스톨 해준다.


$ yarn add ngeohash

ngeohash 를 인스톨 했다면, 위의 바운더리 좌표를 가지고 다음과 같이 모든 geohash를 구해준다!


import ngeohash from 'ngeohash'

const bboxes = Geohash.bboxes(
	swlat,
	swlng,
	nelat,
	nelng,
	precision - 2 > 1 ? precision - 2 : 1,
);

어떤 경우에서는, precision을 각각 맵 SDK zoom level에 알맞게 조절을 해주어야 하는데, 예제 코드는 다음과 같다! (geohash는 precision을 최대 12까지 가지기 때문)


const getGeohashPrecision = (zoomLevel: number) => {
  if (zoomLevel <= 3) {
    return 2;
  }
  if (zoomLevel <= 6) {
    return 4;
  }
  if (zoomLevel <= 9) {
    return 6;
  }
  if (zoomLevel <= 12) {
    return 7;
  }
  return 8; // Zoom level 13+ gives higher precision
};

위에서 구해진 bboxes는 결국 geohash를 담은 배열이다. bboxes는 유저가 맵을 통해 보고있는 바운더리 안의 모든 geohash를 의미한다. 따라서 이 데이터를 server side로 http 통신을 통해 넘겨주어야 한다.

query string을 통해 넘겨주는 여러 방식들이 있고, 해당 부분은 구현하는 서버 개발자의 몫인 것 같아 여러가지 http 통신을 통해 넘기는 방법이 존재 할 수 있다.

서버 사이드.

해당 geohash 값들을 쿼리 스트링을 통해 파싱을 하면, 클라이언트에서 원하는 geohash 배열을 구할 수 있다. 해당 값을 사용하여 다음과 같이 쿼링을 한다.


const data = await database.location.findMany({
      where: {
        OR: geohashes.map((prefix) => ({
					geohash: {
						startsWith: prefix,
				}
			}))
		}
})

클라이언트에서 원하는 geohash를 가지고 있는 모든 위치 데이터를 가져 올 것이다. 이제 클러스터링이 남았다. 클러스터링에서는 ml-kmeans 라는 라이브러리를 사용하면 좋다고 한다.


import { kmeans } from 'ml-kmeans'

const limit = 20
const kmeansData = kmeans(
	data.map((item) => [Number(item.lat), Number(item.lng)]),
	limit,
	{}
)

lat, lng 데이터를 가지고 주어진 limit 값을 통해 몇 개의 클러스터를 만들어 낼지 결정해준다. 해당 limit 값을 조절하는 것이 꽤나 애매한 부분.


const limit =
    // eslint-disable-next-line no-nested-ternary
    data.length < 100
      ? data.length
      : data.length > (geohashes?.length ?? 0)
      ? geohashes?.length ?? 0
      : data.length

kmeans 에서는 데이터의 값보다 limit 값이 크면 에러를 반환하므로, 위와 같이 적당한 값을 조절해 주는 작업이 필요하다. 만약 cluster data에 부가적인 데이터를 담아야 한다면 다음과 같이 담아주면 된다.


const limit =
    // eslint-disable-next-line no-nested-ternary
    data.length < 100
      ? data.length
      : data.length > (geohashes?.length ?? 0)
      ? geohashes?.length ?? 0
      : data.length
      
const kmeansData = kmeans(
	data.map((item) => [Number(item.lat), Number(item.lng)]),
	limit,
	{}
)

const clusterData = kmeansData.cluster.map((clusterId, index) => ({
	// data는 prisma 에서 쿼링한 location table의 row이다
	...data[index],
	clusterId
})

이렇게 Prisma를 통해 쿼링된 데이터와 클러스터링 된 데이터를 배열로 묶어주었다.

후 처리 작업이 필요하다면,


const limit =
    // eslint-disable-next-line no-nested-ternary
    data.length < 100
      ? data.length
      : data.length > (geohashes?.length ?? 0)
      ? geohashes?.length ?? 0
      : data.length
      
const kmeansData = kmeans(
	data.map((item) => [Number(item.lat), Number(item.lng)]),
	limit,
	{}
)

const clusterData = kmeansData.cluster.map((clusterId, index) => ({
	// data는 prisma 에서 쿼링한 location table의 row이다
	...data[index],
	clusterId
})

const locationPins = clusterData.reduce(
    (acc, cluster) => {
      const { clusterId } = cluster
      if (!acc[clusterId]) {
        acc[clusterId] = {
          title: '',
          sumLat: 0,
          sumLng: 0,
          locations: [],
        }
      }

      acc[clusterId].sumLat += Number(cluster.lat)
      acc[clusterId].sumLng += Number(cluster.lng)
      acc[clusterId].locations.push({
	      title: cluster.title,
        lat: cluster.lat ? Number(cluster.lat) : 0,
        lng: cluster.lng ? Number(cluster.lng) : 0,
      })

      return acc
    },
    {} as {
      [key: string]: {
        thumbnailImageUrl: string
        sumLat: number
        sumLng: number
        locations: {
	        title: string
          lat: number
          lng: number
        }[]
      }
    }
  )

이제 해당 locationPins를 DTO 데이터로 만들어서 serialized 된 데이터를 response로 반환하면 된다!


return Object.keys(locationPins).map(
      (key) =>
        new LocationPinDTO({
          lat: locationPins[key].sumLat / locationPins[key].locations.length,
          lng: locationPins[key].sumLng / locationPins[key].locations.length,
          count: locationPins[key].locations.length,
          locations: locationPins[key].locations
        })
    )

이렇게 받아와진 response를 클라이언트에서는 잘 파싱하여 지도 SDK 위에 보여주면 된다. 만약 클러스터의 count가 1이라면, 일반 핀으로 렌더링 하면 된다. 1보다 크다면 클러스터링 된 모양으로 지도 위에 렌더링 하면 된다.