CF에서 버킷으로 csv를 쓸 때: 'with open(filepath, "w") as MY_CSV:'는 "FileNotFoundError: [Errno 2] 해당 파일 또는 디렉터리가 없습니다:"가 발생합니다.

Question

해결책은 놀랍습니다. 너~ 해야 하다.gcsfsopen()

을 사용하는 경우에는 pd.to_csv()필요 import gcsfs하지 않지만gcsfs작업을 requirements.txt수행 하려면 여전히 필요합니다.pd.to_csv(), 따라서 팬더는 to_csv()자동으로 사용하는 것 같습니다.

놀랍게도 pd.to_csv()다음은 질문에 답하는 코드입니다(테스트됨).

def write_to_csv_file(connection, filepath):
    """Write the QUERY result in a loop over batches into a csv.
    This is done in batches since the query from the database is huge.
    :param connection: mysqldb connection to DB
    :param filepath: path to csv file to write data
    return: metadata on rows and time
    """
    countrows = 0
    print("Right before opening the file ...")
   

    # A gcsfs object is needed to open a file.
    # https://stackoverflow.com/questions/52805016/how-to-open-a-file-from-google-cloud-storage-into-a-cloud-function
    # https://gcsfs.readthedocs.io/en/latest/index.html#examples
    # Side-note (Exception):
    # pd.to_csv() needs neither the gcsfs object, nor its import.
    # It is not used here, but it has been tested with examples.
    fs = gcsfs.GCSFileSystem(project=MY_PROJECT)
    fs.ls(BUCKET_NAME)


    # wb needed, else "builtins.TypeError: must be str, not bytes"
    # https://stackoverflow.com/questions/5512811/builtins-typeerror-must-be-str-not-bytes
    with fs.open(filepath, 'wb') as outcsv:
        print("Right after opening the file ...")

        writer = csv.DictWriter(
            outcsv,
            fieldnames=FIELDNAMES,
            extrasaction="ignore",
            delimiter="|",
            lineterminator="\n",
        )
        # write header according to fieldnames
        print("before writer.writeheader()")
        writer.writeheader()
        print("after writer.writeheader()")

        for batch in query_execute_batch(connection):
            writer.writerows(batch)
            countrows += len(batch)
        datetime_now_save = datetime.now()
    return countrows, datetime_now_save

사이드 노트

이와 같이 CSV 작성기를 사용하지 마십시오.

pd.to_csv()700,000개 행을 로드하고 버킷에 CSV로 저장하는 데 62초만 필요한 5000 매개변수를 사용하는 대신 chunksize배치 작성자가 있는 CF는 9분 이상 소요됩니다. 시간 초과 제한. 따라서 대신 사용 pd.to_csv()하고 데이터를 데이터 프레임으로 변환해야 합니다.

Answer 1

해결책은 놀랍습니다. 너~ 해야 하다.gcsfsopen()

을 사용하는 경우에는 pd.to_csv()필요 import gcsfs하지 않지만gcsfs작업을 requirements.txt수행 하려면 여전히 필요합니다.pd.to_csv(), 따라서 팬더는 to_csv()자동으로 사용하는 것 같습니다.

놀랍게도 pd.to_csv()다음은 질문에 답하는 코드입니다(테스트됨).

def write_to_csv_file(connection, filepath):
    """Write the QUERY result in a loop over batches into a csv.
    This is done in batches since the query from the database is huge.
    :param connection: mysqldb connection to DB
    :param filepath: path to csv file to write data
    return: metadata on rows and time
    """
    countrows = 0
    print("Right before opening the file ...")
   

    # A gcsfs object is needed to open a file.
    # https://stackoverflow.com/questions/52805016/how-to-open-a-file-from-google-cloud-storage-into-a-cloud-function
    # https://gcsfs.readthedocs.io/en/latest/index.html#examples
    # Side-note (Exception):
    # pd.to_csv() needs neither the gcsfs object, nor its import.
    # It is not used here, but it has been tested with examples.
    fs = gcsfs.GCSFileSystem(project=MY_PROJECT)
    fs.ls(BUCKET_NAME)


    # wb needed, else "builtins.TypeError: must be str, not bytes"
    # https://stackoverflow.com/questions/5512811/builtins-typeerror-must-be-str-not-bytes
    with fs.open(filepath, 'wb') as outcsv:
        print("Right after opening the file ...")

        writer = csv.DictWriter(
            outcsv,
            fieldnames=FIELDNAMES,
            extrasaction="ignore",
            delimiter="|",
            lineterminator="\n",
        )
        # write header according to fieldnames
        print("before writer.writeheader()")
        writer.writeheader()
        print("after writer.writeheader()")

        for batch in query_execute_batch(connection):
            writer.writerows(batch)
            countrows += len(batch)
        datetime_now_save = datetime.now()
    return countrows, datetime_now_save

사이드 노트

이와 같이 CSV 작성기를 사용하지 마십시오.

pd.to_csv()700,000개 행을 로드하고 버킷에 CSV로 저장하는 데 62초만 필요한 5000 매개변수를 사용하는 대신 chunksize배치 작성자가 있는 CF는 9분 이상 소요됩니다. 시간 초과 제한. 따라서 대신 사용 pd.to_csv()하고 데이터를 데이터 프레임으로 변환해야 합니다.

CF에서 버킷으로 csv를 쓸 때: 'with open(filepath, "w") as MY_CSV:'는 "FileNotFoundError: [Errno 2] 해당 파일 또는 디렉터리가 없습니다:"가 발생합니다.

답변1

사이드 노트

관련 정보