annotate grok_interview/inference_my_version.py @ 54:b3e82d22f961

[PostDog] Initial commit BROKEN
author June Park <parkjune1995@gmail.com>
date Fri, 19 Dec 2025 13:58:52 -0800
parents 68fa88ac73fe
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
51
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
1 """
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
2 Inference Questions
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
3
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
4 Context
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
5
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
6 You are tasked with building a simplified inference engine component responsible for handling incoming user requests for a large language model (LLM). To optimize throughput and GPU utilization, the engine must batch multiple requests together, run the inference call once per batch, and then deconstruct the results to return token-level output to the individual users.
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
7
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
8 Objective
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
9
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
10 Complete the provided Python class, BatchInferenceEngine by implementing the methods necessary to:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
11
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
12 Queue incoming user requests.
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
13
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
14 Process a batch when the queue reaches a defined batch size.
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
15 Simulate the token-level output from an LLM and correctly associate each generated token with its original request.
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
16 """
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
17
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
18 import asyncio
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
19 from dataclasses import dataclass, field
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
20 from time import time
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
21 from typing import Dict, List
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
22 import uuid
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
23
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
24 @dataclass
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
25 class UserRequest:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
26 prompt: str
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
27 id: str = field(default_factory=lambda: str(uuid.uuid4()))
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
28 created_at: float = field(default_factory=time)
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
29
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
30
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
31 @dataclass
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
32 class TokenOutput:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
33 request_id: str
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
34 token: bytes
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
35
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
36 class BatchInferenceEngine:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
37
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
38 def __init__(self, batch_sizes: int = 8):
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
39 self.queue = []
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
40 self.request_token_map: Dict[str, str] = {}
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
41 self.batch_sizes = batch_sizes
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
42
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
43 self._lock = asyncio.Lock()
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
44 self._batch_event = asyncio.Event()
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
45
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
46 async def add_to_queue(self, request: UserRequest):
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
47 async with self._lock:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
48 self.queue.append(request)
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
49
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
50 if len(self.queue) > self.batch_sizes:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
51 self._batch_event.set()
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
52
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
53 task = asyncio.create_task(self._batch())
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
54 return task
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
55
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
56 async def _batch(self):
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
57 while True:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
58 try:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
59 await asyncio.wait_for(self._batch_event.wait(), timeout=0.05)
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
60 except:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
61 raise Exception("Timed out")
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
62
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
63 async with self._lock:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
64 if not self.queue:
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
65 self._batch_event.clear()
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
66 continue
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
67
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
68 batch = self.queue[:self.batch_sizes]
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
69 tokens = await self._handle_prompt_to_token(batch)
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
70
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
71 return tokens
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
72
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
73
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
74 async def _handle_prompt_to_token(self, batch: List[UserRequest]):
68fa88ac73fe Interview prep for xAI
June Park <parkjune1995@gmail.com>
parents:
diff changeset
75 pass