Post

[AI Assignment - Python] ๐Ÿ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ์‹ค์ „ โ€” batch, file reader, infinite ID

[AI Assignment - Python] ๐Ÿ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ์‹ค์ „ โ€” batch, file reader, infinite ID

๐ŸŽฏ ๊ณผ์ œ ๊ฐœ์š”

๋ฐ์ดํ„ฐ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ ์ž์ฃผ ์“ฐ์ด๋Š” ์ œ๋„ˆ๋ ˆ์ดํ„ฐ 3๊ฐœ๋ฅผ ๊ตฌํ˜„ํ•ด๋ดค์Šต๋‹ˆ๋‹ค.

  1. batch_generator โ€” ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐฐ์น˜ ๋‹จ์œ„๋กœ ๋‚˜๋ˆ„๊ธฐ
  2. file_line_reader โ€” ๋Œ€์šฉ๋Ÿ‰ ํŒŒ์ผ์„ ํ•œ ์ค„์”ฉ ์ฝ๊ธฐ (๋นˆ ์ค„ ์ฒ˜๋ฆฌ ์˜ต์…˜)
  3. infinite_id_generator โ€” ๋ฌดํ•œ ๊ณ ์œ  ID ์ƒ์„ฑ๊ธฐ

์š”๊ตฌ์‚ฌํ•ญ: ๋ชจ๋“  ํ•จ์ˆ˜์— ํƒ€์ž… ํžŒํŠธ, Generator ๋˜๋Š” Iterator ์‚ฌ์šฉ, ํŒŒ์ผ์ด ์—†์œผ๋ฉด FileNotFoundError ๋ฐœ์ƒ, next()์™€ for ๋‘ ๋ฐฉ์‹ ๋ชจ๋‘ ํ…Œ์ŠคํŠธ.


๐Ÿ“Œ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ ๊ธฐ๋ณธ ๊ฐœ๋…

์ œ๋„ˆ๋ ˆ์ดํ„ฐ๋Š” ๊ฐ’์„ ํ•œ ๋ฒˆ์— ๋‹ค ๋งŒ๋“ค์ง€ ์•Š๊ณ , ํ•„์š”ํ•  ๋•Œ ํ•˜๋‚˜์”ฉ ๋งŒ๋“ค์–ด๋‚ด๋Š” ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. return ๋Œ€์‹  yield๋ฅผ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค.

1
2
3
4
5
6
7
8
9
10
11
# ์ผ๋ฐ˜ ํ•จ์ˆ˜ โ€” ๋ฆฌ์ŠคํŠธ๋ฅผ ํ†ต์งธ๋กœ ๋ฉ”๋ชจ๋ฆฌ์— ์˜ฌ๋ฆผ
def get_numbers(n: int) -> list[int]:
    result = []
    for i in range(n):
        result.append(i ** 2)
    return result                    # 1์–ต ๊ฐœ๋ฉด ๋ฉ”๋ชจ๋ฆฌ ํญ๋ฐœ

# ์ œ๋„ˆ๋ ˆ์ดํ„ฐ โ€” ํ•˜๋‚˜์”ฉ ์ƒ์„ฑ
def gen_numbers(n: int):
    for i in range(n):
        yield i ** 2                 # 1์–ต ๊ฐœ์—ฌ๋„ ๋ฉ”๋ชจ๋ฆฌ 1๊ฐœ๋ถ„๋งŒ ์‚ฌ์šฉ

1๏ธโƒฃ batch_generator

1์ฐจ ์‹œ๋„

1
2
3
4
5
def batch_generator(data: list, batch_size: int):
    for i in range(len(data)):
        if i * batch_size >= len(data):
            return
        yield data[i * batch_size:i * batch_size + batch_size]

๋™์ž‘์€ ๋งž์ง€๋งŒ range(len(data))๋ฅผ ๋งŒ๋“ค์–ด๋†“๊ณ  ๋Œ€๋ถ€๋ถ„ ๋ฒ„๋ฆฝ๋‹ˆ๋‹ค. 10๊ฐœ ๋ฐ์ดํ„ฐ๋ฉด range(10)์ด์ง€๋งŒ ์‹ค์ œ๋กœ ํ•„์š”ํ•œ ๊ฑด 4๋ฒˆ ๋ฃจํ”„๋ฟ์ž…๋‹ˆ๋‹ค.

โœ… ๊ฐœ์„ : range์˜ step ์ธ์ž ํ™œ์šฉ

range(start, stop, step)์˜ ์„ธ ๋ฒˆ์งธ ์ธ์ž๋กœ ์ฆ๊ฐ€ํญ์„ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

1
2
3
def batch_generator(data: list, batch_size: int) -> Iterator[list]:
    for i in range(0, len(data), batch_size):    # 0, 3, 6, 9
        yield data[i:i + batch_size]

์Šฌ๋ผ์ด์‹ฑ์ด ๋ฒ”์œ„๋ฅผ ์ดˆ๊ณผํ•ด๋„ ์—๋Ÿฌ ์—†์ด ์ž˜๋ผ์ฃผ๊ธฐ ๋•Œ๋ฌธ์—(data[9:12] โ†’ [9]), ์กฐ๊ฑด๋ฌธ๋„ return๋„ ํ•„์š”ํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.


2๏ธโƒฃ file_line_reader

์—ฌ๋Ÿฌ ๋ฒˆ ํ—ค๋งธ๋˜ ํ•จ์ˆ˜์ž…๋‹ˆ๋‹ค. ์‹œ๋„๋ณ„๋กœ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.

1์ฐจ ์‹œ๋„ โ€” ๋‘ ๊ฐ€์ง€ ๋ฌธ์ œ

1
2
3
4
5
6
7
def file_line_reader(filepath: str, skip_empty: bool = True):
    try:
        with open(filepath, 'r', encoding='utf-8') as file:
            for line in file:
                yield line.strip()
    except FileNotFoundError:
        print("ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ๊ฒฝ๋กœ๋ฅผ ํ™•์ธํ•ด์ฃผ์„ธ์š”.")

โ‘  skip_empty๋ฅผ ๊ตฌํ˜„ํ•˜์ง€ ์•Š์Œ. ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” ๋ฐ›์•„๋†“๊ณ  ์ „ํ˜€ ์‚ฌ์šฉํ•˜์ง€ ์•Š์•„ ๋นˆ ์ค„์ด ๊ทธ๋Œ€๋กœ ์ถœ๋ ฅ๋ฉ๋‹ˆ๋‹ค.

โ‘ก ์—๋Ÿฌ๋ฅผ print๋กœ ๋ฎ์–ด๋ฒ„๋ฆผ. ์š”๊ตฌ์‚ฌํ•ญ์€ โ€œFileNotFoundError๋ฅผ ๋ฐœ์ƒ์‹œํ‚จ๋‹คโ€์˜€๋Š”๋ฐ, try/except๋กœ ์žก์•„์„œ ์กฐ์šฉํžˆ ์ฒ˜๋ฆฌํ•˜๊ณ  ์žˆ์—ˆ์Šต๋‹ˆ๋‹ค.

1
2
3
4
5
# ํ˜ธ์ถœ์ž ์ž…์žฅ์—์„œ
for line in file_line_reader("์—†๋Š”ํŒŒ์ผ.csv"):
    process(line)
# โ†’ "ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค" ์ถœ๋ ฅ ํ›„ for ๋ฃจํ”„ 0ํšŒ ์‹คํ–‰
# โ†’ ๋ฒ„๊ทธ๊ฐ€ ์กฐ์šฉํžˆ ์ˆจ์–ด๋ฒ„๋ฆผ

โš ๏ธ ์—๋Ÿฌ๋ฅผ print๋กœ ๋ฎ์–ด๋ฒ„๋ฆฌ๋ฉด ํ˜ธ์ถœ์ž๋Š” ๋ฌด์—‡์ด ์ž˜๋ชป๋๋Š”์ง€ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์—๋Ÿฌ๋Š” ์ˆจ๊ธฐ์ง€ ๋ง๊ณ  ๋ฐœ์ƒ์‹œ์ผœ์„œ ํ˜ธ์ถœ์ž๊ฐ€ ์ฒ˜๋ฆฌํ•˜๊ฒŒ ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

2์ฐจ ์‹œ๋„ โ€” skip_empty ์˜๋ฏธ๋ฅผ ์ž˜๋ชป ์ดํ•ด

1
2
3
4
5
6
7
8
9
10
def file_line_reader(filepath: str, skip_empty: bool = True):
    if os.path.exists(filepath):
        with open(filepath, 'r', encoding='utf-8') as file:
            for line in file:
                if skip_empty:
                    yield line.strip()
                else:
                    yield line
    else:
        print(f"'{filepath}' ํŒŒ์ผ์„ ์ฐพ์„ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค.")

os.path.exists๋กœ ๋จผ์ € ์ฒดํฌํ•˜๋„๋ก ๋ฐ”๊ฟจ์ง€๋งŒ ์—ฌ์ „ํžˆ ์—†์„ ๋•Œ print๋งŒ ํ•˜๊ณ  ์กฐ์šฉํžˆ ๋๋‚ฉ๋‹ˆ๋‹ค. ๊ทธ๋ฆฌ๊ณ  skip_empty์˜ ์˜๋ฏธ๋ฅผ ์™„์ „ํžˆ ์ž˜๋ชป ์ดํ•ดํ–ˆ์Šต๋‹ˆ๋‹ค.

๋‚ด๊ฐ€ ๊ตฌํ˜„ํ•œ ์˜๋ฏธ์‹ค์ œ ์˜๋ฏธ
True โ†’ strip() ์ ์šฉTrue โ†’ ๋นˆ ์ค„ ๊ฑด๋„ˆ๋›ฐ๊ธฐ
False โ†’ strip() ๋ฏธ์ ์šฉFalse โ†’ ๋นˆ ์ค„๋„ yield

strip()์€ skip_empty์™€ ๋ฌด๊ด€ํ•˜๊ฒŒ ํ•ญ์ƒ ์ ์šฉ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค. skip_empty๋Š” strip ์ดํ›„ ๋นˆ ๋ฌธ์ž์—ด์ด๋ฉด ๊ฑด๋„ˆ๋›ธ์ง€ ๊ฒฐ์ •ํ•˜๋Š” ํ”Œ๋ž˜๊ทธ์ž…๋‹ˆ๋‹ค.

โœ… 3์ฐจ ์‹œ๋„ (์™„์„ฑ) โ€” EAFP ํŒจํ„ด

1
2
3
4
5
6
7
def file_line_reader(filepath: str, skip_empty: bool = True) -> Iterator[str]:
    with open(filepath, 'r', encoding='utf-8') as file:
        for line in file:
            stripped = line.strip()
            if skip_empty and not stripped:    # ๋นˆ ๋ฌธ์ž์—ด์€ falsy
                continue
            yield stripped

EAFP ํŒจํ„ด ์ ์šฉ. os.path.exists๋กœ ๋จผ์ € ์ฒดํฌ(LBYL)ํ•˜์ง€ ์•Š๊ณ , open()์ด ์•Œ์•„์„œ FileNotFoundError๋ฅผ ๋˜์ง€๋„๋ก ๋‘ก๋‹ˆ๋‹ค. ์ด๊ฒƒ์ด ํŒŒ์ด์ฌ ์ฒ ํ•™์ธ EAFP(Easier to Ask for Forgiveness than Permission) ์ž…๋‹ˆ๋‹ค.

skip_empty ์˜ฌ๋ฐ”๋ฅธ ๊ตฌํ˜„. strip()์€ ํ•ญ์ƒ ์ ์šฉํ•˜๊ณ , ๋นˆ ๋ฌธ์ž์—ด์ด๋ฉด continue๋กœ ๊ฑด๋„ˆ๋œ๋‹ˆ๋‹ค. Python์—์„œ ๋นˆ ๋ฌธ์ž์—ด์€ falsy์ด๋ฏ€๋กœ not stripped๋กœ ์ฒดํฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.

Tip: LBYL(โ€œ๋˜๋Š”์ง€ ํ™•์ธํ•˜๊ณ  ํ•ด๋ผโ€)๋ณด๋‹ค EAFP(โ€œ์ผ๋‹จ ํ•ด๋ณด๊ณ  ์•ˆ ๋˜๋ฉด ์˜ˆ์™ธโ€)๊ฐ€ ๋” ํŒŒ์ด์ฌ๋‹ค์šด ๋ฐฉ์‹์ž…๋‹ˆ๋‹ค.


3๏ธโƒฃ infinite_id_generator

1
2
3
4
5
def infinite_id_generator(prefix: str = "item") -> Iterator[str]:
    count = 0
    while True:
        count += 1
        yield f"{prefix}_{count:04d}"

while True + yield๋กœ ๋ฌดํ•œ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. f"{count:04d}"๋Š” 4์ž๋ฆฌ 0ํŒจ๋”ฉ์ž…๋‹ˆ๋‹ค. ์ œ๋„ˆ๋ ˆ์ดํ„ฐ๋Š” next() ํ˜ธ์ถœ ์‚ฌ์ด์— ์ƒํƒœ๋ฅผ ์œ ์ง€ํ•˜๋ฏ€๋กœ count๊ฐ€ ์ž๋™์œผ๋กœ ๋ˆ„์ ๋ฉ๋‹ˆ๋‹ค.


โ–ถ๏ธ ์‹คํ–‰ ๊ฒฐ๊ณผ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
=== batch_generator ===
[0, 1, 2]
[3, 4, 5]
[6, 7, 8]
[9]

=== file_line_reader(skip_empty=True) ===
'์„œ์šธ,15.2,45,32.1'
'๋ถ€์‚ฐ,17.8,61,22.3'
'๋Œ€๊ตฌ,19.3,38,41.5'
'์ธ์ฒœ,14.7,55,29.8'

=== file_line_reader(skip_empty=False) ===
'์„œ์šธ,15.2,45,32.1'
'๋ถ€์‚ฐ,17.8,61,22.3'
''
'๋Œ€๊ตฌ,19.3,38,41.5'
''
'์ธ์ฒœ,14.7,55,29.8'

=== ์—†๋Š” ํŒŒ์ผ ===
FileNotFoundError: [Errno 2] No such file or directory: 'nonexistent.csv'

=== infinite_id_generator ===
user_0001
user_0002
user_0003

๐Ÿท๏ธ ํƒ€์ž… ํžŒํŠธ โ€” Generator vs Iterator

Generator์˜ ํƒ€์ž… ํŒŒ๋ผ๋ฏธํ„ฐ๋Š” 3๊ฐœ์ž…๋‹ˆ๋‹ค.

1
Generator[YieldType, SendType, ReturnType]
  • YieldType โ€” yield๋กœ ๋‚ด๋ณด๋‚ด๋Š” ๊ฐ’์˜ ํƒ€์ž…
  • SendType โ€” .send()๋กœ ์ œ๋„ˆ๋ ˆ์ดํ„ฐ์— ๊ฐ’์„ ๋ณด๋‚ผ ๋•Œ์˜ ํƒ€์ž…
  • ReturnType โ€” ์ œ๋„ˆ๋ ˆ์ดํ„ฐ๊ฐ€ ๋๋‚  ๋•Œ returnํ•˜๋Š” ๊ฐ’์˜ ํƒ€์ž…

๋Œ€๋ถ€๋ถ„์€ YieldType๋งŒ ์‹ ๊ฒฝ์“ฐ๋ฉด ๋˜๋ฏ€๋กœ ๋‚˜๋จธ์ง€๋Š” None์œผ๋กœ ๋‘๊ฒŒ ๋ฉ๋‹ˆ๋‹ค. ๋” ๊ฐ„๊ฒฐํ•œ ๋ฐฉ๋ฒ•์€ Iterator๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค.

1
2
3
4
from collections.abc import Iterator

def batch_generator(data: list, batch_size: int) -> Iterator[list]:
    ...

Tip: .send()๋‚˜ return ๊ฐ’์„ ์‚ฌ์šฉํ•  ์ผ์ด ์—†๋‹ค๋ฉด Generator[T, None, None] ๋Œ€์‹  Iterator[T]๊ฐ€ ํ›จ์”ฌ ๊น”๋”ํ•ฉ๋‹ˆ๋‹ค.


๐Ÿ’ก ์ด๋ฒˆ ๊ณผ์ œ์—์„œ ๋ฐฐ์šด ๊ฒƒ

yield์˜ ๊ธฐ๋ณธ ๋™์ž‘: ํ•จ์ˆ˜๊ฐ€ yield๋ฅผ ๋งŒ๋‚˜๋ฉด ๊ฐ’์„ ๋‚ด๋ณด๋‚ด๊ณ  ์ผ์‹œ์ •์ง€ํ•ฉ๋‹ˆ๋‹ค. ๋‹ค์Œ ํ˜ธ์ถœ์—์„œ ๊ทธ ์ž๋ฆฌ๋ถ€ํ„ฐ ์žฌ๊ฐœํ•ฉ๋‹ˆ๋‹ค.

range(start, stop, step): ์ฆ๊ฐ€ํญ์„ ์ง์ ‘ ์ง€์ •ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. range(0, len(data), batch_size) ๊ฐ™์€ ํŒจํ„ด์€ ํŒŒ์ด์ฌ ์–ด๋””์„œ๋“  ํ™œ์šฉ๋ฉ๋‹ˆ๋‹ค.

EAFP ํŒจํ„ด: os.path.exists๋กœ ๋จผ์ € ์ฒดํฌ(LBYL)ํ•˜๋Š” ๋Œ€์‹ , ์ผ๋‹จ open()์„ ์‹œ๋„ํ•˜๊ณ  ์˜ˆ์™ธ๊ฐ€ ๋ฐœ์ƒํ•˜๋ฉด ์ฒ˜๋ฆฌ(EAFP)ํ•˜๋Š” ๊ฒƒ์ด ๋” ํŒŒ์ด์ฌ๋‹ต์Šต๋‹ˆ๋‹ค.

์—๋Ÿฌ๋ฅผ ์ˆจ๊ธฐ์ง€ ๋ง๊ธฐ: try/except๋กœ ์žก์•„์„œ print๋กœ ๋ฎ๋Š” ๊ฒƒ์€ ๋””๋ฒ„๊น…์„ ์–ด๋ ต๊ฒŒ ๋งŒ๋“ญ๋‹ˆ๋‹ค. ํ˜ธ์ถœ์ž๊ฐ€ ์˜ˆ์™ธ๋ฅผ ๋ฐ›์•„์„œ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋„๋ก ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ๋ฐœ์ƒ์‹œ์ผœ์•ผ ํ•ฉ๋‹ˆ๋‹ค.

skip_empty ๊ฐ™์€ ํŒŒ๋ผ๋ฏธํ„ฐ์˜ ์˜๋ฏธ: ์ด๋ฆ„๋งŒ ๋ณด๊ณ  ์ถ”์ธกํ•˜์ง€ ๋ง๊ณ  ์š”๊ตฌ์‚ฌํ•ญ์„ ์ •ํ™•ํžˆ ์ฝ์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

ํƒ€์ž… ํžŒํŠธ ๊ฐ„๊ฒฐํ•˜๊ฒŒ: Generator[T, None, None] ๋Œ€์‹  Iterator[T]๋ฅผ ์‚ฌ์šฉํ•ฉ์‹œ๋‹ค.


๐Ÿ”„ ๋ฐ˜๋ณตํ•ด์„œ ๋‚˜ํƒ€๋‚œ ํŒจํ„ด

์„ธ ๊ณผ์ œ๋ฅผ ์ง„ํ–‰ํ•˜๋ฉด์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ๋งˆ์ฃผ์นœ ๋‘ ๊ฐ€์ง€ ํŒจํ„ด์ด ์žˆ์Šต๋‹ˆ๋‹ค.

โ‘  ํ•˜๋“œ์ฝ”๋”ฉ ์˜์กด: result == 'done', isinstance(arg, int), skip_empty ์˜คํ•ด โ€” ์ „๋ถ€ ํŠน์ • ๊ฐ’์ด๋‚˜ ์ƒํ™ฉ์— ์˜์กดํ•˜๋Š” ์ฝ”๋“œ๋ฅผ ์ž‘์„ฑํ•˜๋Š” ์Šต๊ด€์ž…๋‹ˆ๋‹ค. ๋ฒ”์šฉ ํ•จ์ˆ˜๋ฅผ ๋งŒ๋“ค ๋•Œ๋Š” ์˜๋„์ ์œผ๋กœ ์ถ”์ƒํ™”ํ•ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

โ‘ก ์—๋Ÿฌ ์ˆจ๊น€: ์—๋Ÿฌ๋ฅผ ๊ฐ’์œผ๋กœ ๋ฐ˜ํ™˜ํ•˜๊ฑฐ๋‚˜ print ํ›„ ์กฐ์šฉํžˆ ์ข…๋ฃŒํ•˜๋ฉด ํ˜ธ์ถœ์ž๋Š” ๋ฌด์—‡์ด ์ž˜๋ชป๋๋Š”์ง€ ์•Œ ์ˆ˜ ์—†์Šต๋‹ˆ๋‹ค. ์˜ˆ์™ธ๋Š” ์ ์ ˆํ•œ ๋ ˆ์ด์–ด์—์„œ ์ฒ˜๋ฆฌ๋˜์–ด์•ผ ํ•ฉ๋‹ˆ๋‹ค.

This post is licensed under CC BY 4.0 by the author.