티스토리 뷰

deep learning

이미지 임베딩이란?

주먹불끈 2023. 4. 13. 15:17

개요

페이스북이 발표한 SAM 관련한 데모를 둘러보던 중 개념을 듣게 되어 정리함

이미지 임베딩(Image Embedding)이란 이미지를 저차원 공간에 수학적으로 표현한 것을 말한다. 신경망에 이미지를 넣어주고 특정 레이어에서의 출력을 임베딩으로 하는 것이다. 이러한 임베딩은 이미지의 특정하고, 중요한 feature를 잡아낸 것이라 보면 된다. 압축된 형태이기에 처리하기에 쉽고 빠르다.

이미지 임베딩은 심지어 NLP(Natural Language Processing)에서도 쓸 수 있는데 이미지와 이미지에 대응하는 문장 표현을 동일한 잠재적인 공간(latent space)에 임베딩할 수 있게 되는 것이다.

ChatGPT 결과

prompt: What is an image embedding?

 

An image embedding is a mathematical representation of an image in a low-dimensional space. It is created by feeding an image through a neural network and using the output of a particular layer in the network as the embedding. The embedding is designed to capture some of the important features of the image in a compressed form, making it easier and faster to process.

The use of embeddings has become increasingly popular in machine learning, including in computer vision tasks such as object recognition and image retrieval. With a big enough set of image embeddings, it becomes possible to build applications that can search for a plant using pictures of its flower or leaves, find similar images on the web, or search for similarly-looking products [3].

One approach to generating image embeddings is to use a pre-trained convolutional neural network (CNN). The CNN processes the input image and produces a feature map, which can then be flattened and passed through a fully-connected layer to generate the final embedding. Another approach is to use a Siamese network, which takes in two images and outputs a similarity score between them. The network can be trained to optimize the embeddings such that similar images have a high similarity score [2].

Image embeddings can be used in a variety of applications beyond computer vision, such as natural language processing (NLP). In these cases, an image and a corresponding text description can be translated into embeddings in the same latent space, allowing for cross-modal retrieval and manipulation of the data [2].

Overall, image embeddings provide a way to represent images in a more efficient and useful way for machine learning applications.

반응형

'deep learning' 카테고리의 다른 글

윈도우즈에서 yolo v3 돌려보기 2/2  (19) 2018.08.02
윈도우즈에서 yolo v3 돌려보기 1/2  (0) 2018.08.02
반응형
잡학툰 뱃지
최근에 올라온 글
최근에 달린 댓글
Total
Today
Yesterday
«   2024/05   »
1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30 31
글 보관함