annotate async_multi_threads/inference_my_version.py @ 69:551d9fc0a2ba

Updated wrong names.
author June Park <parkjune1995@gmail.com>
date Thu, 25 Dec 2025 20:07:46 -0800
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
69
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
1 """
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
2 Inference Questions
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
3
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
4 Context
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
5
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
6 You are tasked with building a simplified inference engine component responsible for handling incoming user requests for a large language model (LLM). To optimize throughput and GPU utilization, the engine must batch multiple requests together, run the inference call once per batch, and then deconstruct the results to return token-level output to the individual users.
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
7
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
8 Objective
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
9
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
10 Complete the provided Python class, BatchInferenceEngine by implementing the methods necessary to:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
11
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
12 Queue incoming user requests.
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
13
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
14 Process a batch when the queue reaches a defined batch size.
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
15 Simulate the token-level output from an LLM and correctly associate each generated token with its original request.
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
16 """
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
17
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
18 import asyncio
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
19 from dataclasses import dataclass, field
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
20 from time import time
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
21 from typing import Dict, List
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
22 import uuid
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
23
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
24 @dataclass
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
25 class UserRequest:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
26 prompt: str
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
27 id: str = field(default_factory=lambda: str(uuid.uuid4()))
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
28 created_at: float = field(default_factory=time)
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
29
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
30
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
31 @dataclass
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
32 class TokenOutput:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
33 request_id: str
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
34 token: bytes
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
35
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
36 class BatchInferenceEngine:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
37
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
38 def __init__(self, batch_sizes: int = 8):
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
39 self.queue = []
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
40 self.request_token_map: Dict[str, str] = {}
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
41 self.batch_sizes = batch_sizes
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
42
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
43 self._lock = asyncio.Lock()
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
44 self._batch_event = asyncio.Event()
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
45
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
46 async def add_to_queue(self, request: UserRequest):
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
47 async with self._lock:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
48 self.queue.append(request)
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
49
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
50 if len(self.queue) > self.batch_sizes:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
51 self._batch_event.set()
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
52
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
53 task = asyncio.create_task(self._batch())
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
54 return task
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
55
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
56 async def _batch(self):
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
57 while True:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
58 try:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
59 await asyncio.wait_for(self._batch_event.wait(), timeout=0.05)
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
60 except:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
61 raise Exception("Timed out")
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
62
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
63 async with self._lock:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
64 if not self.queue:
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
65 self._batch_event.clear()
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
66 continue
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
67
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
68 batch = self.queue[:self.batch_sizes]
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
69 tokens = await self._handle_prompt_to_token(batch)
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
70
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
71 return tokens
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
72
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
73
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
74 async def _handle_prompt_to_token(self, batch: List[UserRequest]):
551d9fc0a2ba Updated wrong names.
June Park <parkjune1995@gmail.com>
parents:
diff changeset
75 pass