snowflake rag구성시 csv파일 chunker

Notice

Recent Posts

Recent Comments

Link

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Tags more

Archives

Today

Total

관리 메뉴

오지's blog

snowflake rag구성시 csv파일 chunker 본문

개발노트/Python

snowflake rag구성시 csv파일 chunker

잡스러운노트, 잡스노트 2024. 10. 10. 13:50

728x90

create or replace function csv_text_chunker(file_url string)
returns table (chunk varchar)
language python
runtime_version = '3.9'
handler = 'csv_text_chunker'
packages = ('snowflake-snowpark-python','pandas', 'langchain')
as
$$
from snowflake.snowpark.types import StringType, StructField, StructType
from langchain.text_splitter import RecursiveCharacterTextSplitter
from snowflake.snowpark.files import SnowflakeFile
import io
import logging
import pandas as pd

class csv_text_chunker:

    def read_csv_chunk(self, file_url: str) -> str:
    
        logger = logging.getLogger("udf_logger")
        logger.info(f"Opening file {file_url}")
    
        with SnowflakeFile.open(file_url, 'rb') as f:
            buffer = io.BytesIO(f.readall())
        df = pd.read_csv(buffer)
        text = " ".join(df.astype(str).values.flatten())
        return text

    def process(self,file_url: str):

        text = self.read_csv_chunk(file_url)
        
        text_splitter = RecursiveCharacterTextSplitter(
            chunk_size = 4000, #Adjust this as you see fit
            chunk_overlap  = 400, #This let's text have some form of overlap. Useful for keeping chunks contextual
            length_function = len
        )
    
        chunks = text_splitter.split_text(text)
        df = pd.DataFrame(chunks, columns=['chunks'])
        
        yield from df.itertuples(index=False, name=None)
$$;

https://quickstarts.snowflake.com/guide/asking_questions_to_your_own_documents_with_snowflake_cortex

Build A Document Search Assistant using Vector Embeddings in Cortex AI

In the previous section we have created a simple interface where we can ask questions about our documents and select the LLM running within Snowflake Cortex to answer the question. We have seen that when no context from our documents is provided, we just g

quickstarts.snowflake.com

가이드를 참고하여 pdf파일 chunker말고 csv파일 chunker를 만들어 보았다.

'개발노트 > Python' 카테고리의 다른 글

streamlit에서 chat_input 밑에 버튼을 추가하는 방법 (0)	2024.11.14
dict를 넘길때는 double asterik (0)	2024.06.17
DateTimeOffset.UtcNow.ToUnixTimeSeconds() python변환 (0)	2024.06.13
쿼리 수행시 ValueError: unsupported format character 'I' (0x49) at index 69 에러 해결방안 (0)	2023.04.25
s3에서 가장 최근 업로드한 파일 찾기 - sorted 연습 python (0)	2023.04.13

'개발노트/Python' Related Articles

Comments

오지's blog

snowflake rag구성시 csv파일 chunker 본문

snowflake rag구성시 csv파일 chunker

'개발노트 > Python' 카테고리의 다른 글

티스토리툴바