Mercurial
comparison mrjunejune/src/blog/thoughts-on-tdd/index.md @ 100:65e5a5b89a4e
[Seobeo] Migrated everything to this page.
| author | June Park <parkjune1995@gmail.com> |
|---|---|
| date | Sat, 03 Jan 2026 07:48:07 -0800 |
| parents | |
| children | 1c0878eb17de |
comparison
equal
deleted
inserted
replaced
| 99:684edfaf93b7 | 100:65e5a5b89a4e |
|---|---|
| 1 # Thoughts on TDD | |
| 2 | |
| 3 Is testing important? Ask yourself that question. If you had to think about it for more than a few seconds, you’re either an inexperienced programmer or someone who has never had to release a product to a large group of users. Testing is not just important—it’s essential. It ensures that the software you release is less buggy and more stable because it allows you to catch issues before your customers do. That’s the "why" behind testing that everyone would agree to. | |
| 4 | |
| 5 The real debates arise around the *how*—specifically, approaches to testing, including methodologies like Test-Driven Development (TDD). Over my career, I’ve worked at multiple companies, all of which practiced TDD in some form. This often involved writing unit tests, integration tests, and end-to-end (e2e) tests. However, despite the commonality of TDD, every company seemed to implement it in its own convoluted way, making the code harder to write, debug, and maintain. I want to talk about this practice and my problem with it. | |
| 6 | |
| 7 ## Unit Tests | |
| 8 | |
| 9 > "Unit testing is the process where you test the smallest functional unit of code. Software testing helps ensure code quality, and it's an integral part of software development. It's a software development best practice to write software as small, functional units then write a unit test for each code unit." | |
| 10 > | |
| 11 > — Definition from AWS | |
| 12 | |
| 13 You might agree with the above definition, disagree, or be unsure about what qualifies as a "unit," but most people are likely to agree with it overall. Now, imagine you’re a 21-year-old physics graduate with three months of self-taught coding experience (and a serious League of Legends addiction). Somehow, you land your first job as a software engineer and are tasked with writing a serializer for a `GET` API in Django—and... you guessed it! testing it! | |
| 14 | |
| 15 Here's what such a serializer might look like: | |
| 16 | |
| 17 ```python | |
| 18 from rest_framework import serializers | |
| 19 | |
| 20 class FooSerializer(serializers.Serializer): | |
| 21 name = serializers.CharField(max_length=100) | |
| 22 unit_price = serializers.FloatField() | |
| 23 quantity_on_hand = serializers.IntegerField(default=0) | |
| 24 | |
| 25 def create(self, validated_data): | |
| 26 return Foo.objects.create(**validated_data) | |
| 27 | |
| 28 def update(self, instance, validated_data): | |
| 29 instance.name = validated_data.get('name', instance.name) | |
| 30 instance.unit_price = validated_data.get('unit_price', instance.unit_price) | |
| 31 instance.quantity_on_hand = validated_data.get('quantity_on_hand', instance.quantity_on_hand) | |
| 32 instance.save() | |
| 33 return instance | |
| 34 ``` | |
| 35 | |
| 36 If you’re unfamiliar with Django, the `create` and `update` methods save or update records in the database. It’s normal to serialize an object like this into a response for use in an API endpoint, often with a tool like `JSONRenderer().render(foo.data)`. | |
| 37 | |
| 38 Now, we understand detailed implementation as much as senior INSERT_LIBRARY engineer, the question is: *How do you write a unit test for this?* | |
| 39 | |
| 40 | |
| 41 There are two main reasons why this serializer is hard to test as a true "unit": | |
| 42 | |
| 43 **Dependency on Framework Classes** | |
| 44 | |
| 45 The serializer inherits from Django REST Framework’s `Serializer` class, which comes with built-in behaviors for things like validation and field handling. If you test whether `name` exceeds 100 characters or if `unit_price` is a float, you’re essentially testing whether Django itself works, which is redundant. But you still need to create a test given a *correct* vallue and *incorrect* value because you are testing more for business logic rather than the code. So are we creating unit test for the function ? or the library? | |
| 46 | |
| 47 **Database Dependency** | |
| 48 | |
| 49 The `create` and `update` methods interact with the database directly. Testing them requires either mocking the database (which introduces complexity). If you try to write a test that checks whether the `create` method works, you might end up mocking the `Foo` model, overriding its methods, and verifying that the mock functions were called with the correct arguments. While this might make sense for complex logic, it often feels like overkill for a simple serializer. | |
| 50 | |
| 51 Here’s an example of how you might write such a *unit* test: | |
| 52 | |
| 53 ```python | |
| 54 from unittest.mock import patch | |
| 55 from rest_framework.exceptions import ValidationError | |
| 56 | |
| 57 CORRECT_DATA = { | |
| 58 'name': 'Test Item', | |
| 59 'unit_price': 10.99, | |
| 60 'quantity_on_hand': 5 | |
| 61 } | |
| 62 BAD_DATA = { | |
| 63 'name': 'Test Item' * 100, | |
| 64 'unit_price': "yo", | |
| 65 'quantity_on_hand': 5 | |
| 66 } | |
| 67 | |
| 68 def test_serializer_create(): | |
| 69 serializer = FooSerializer(data=CORRECT_DATA) | |
| 70 | |
| 71 # This is where you are testing the frameworks.... | |
| 72 assert serializer.is_valid() | |
| 73 | |
| 74 with patch('app.models.Foo.objects.create') as mock_create: | |
| 75 serializer.save() | |
| 76 mock_create.assert_called_once_with( | |
| 77 name='Test Item', | |
| 78 unit_price=10.99, | |
| 79 quantity_on_hand=5 | |
| 80 ) | |
| 81 | |
| 82 # And you need to create 3 more! (2 update using CORRECT_DATA, and BAD_DATA and 1 with create only) | |
| 83 ``` | |
| 84 | |
| 85 This test ensures that the `create` method is called with the right data. But if the serializer logic gets more complex, the mocking can become cumbersome and annoying. Think about below case, where `FooSerializers` needs to have a redis singleton because it needs to store certain data inside of redis for faster accesibility for few minutes. | |
| 86 | |
| 87 ```python | |
| 88 class FooSerializer(serializers.Serializer): | |
| 89 ... | |
| 90 def create(self, validated_data): | |
| 91 self.bar(**validated_data, { ttl: 1000 }) | |
| 92 ... | |
| 93 | |
| 94 @cache_property | |
| 95 def bar(self): | |
| 96 return self.get_bar() | |
| 97 | |
| 98 def get_bar(self): | |
| 99 return Bar() | |
| 100 ``` | |
| 101 | |
| 102 In this case, the `get_bar` method is introduced to facilitate testing of the singleton behavior. This allows you to override the `get_bar` method during tests to avoid interacting with the actual Redis singleton. However, this adds another layer of complexity to your tests. | |
| 103 | |
| 104 ```python | |
| 105 def test_serializer_create(): | |
| 106 serializer = FooSerializer(data=CORRECT_DATA) | |
| 107 | |
| 108 assert serializer.is_valid() | |
| 109 | |
| 110 # Mocking the database interaction | |
| 111 with patch('app.models.Foo.objects.create') as mock_create: | |
| 112 # Mocking the singleton method | |
| 113 with patch.object(serializer, 'get_bar', return_value=MockBar()) as mock_bar: | |
| 114 serializer.save() | |
| 115 mock_bar.assert_called_once_with( | |
| 116 name='Test Item', | |
| 117 unit_price=10.99, | |
| 118 quantity_on_hand=5 | |
| 119 | |
| 120 ) | |
| 121 mock_create.assert_called_once_with( | |
| 122 name='Test Item', | |
| 123 unit_price=10.99, | |
| 124 quantity_on_hand=5 | |
| 125 ) | |
| 126 ``` | |
| 127 | |
| 128 You can imagine how the unit test would look as more cascading effects are introduced. With all these 76 lines of code (19 for each case, accounting for 4 tests: 2 with correct data values and 2 with incorrect data values), it essentially tests... | |
| 129 | |
| 130 1. Whether the `create`, `update`, and `Redis` functions are called. | |
| 131 2. Whether the data has the correct units. | |
| 132 | |
| 133 <div class="center"> <img src="https://media3.giphy.com/media/v1.Y2lkPTc5MGI3NjExbHpyY3A5ZjlzNzF5Nmp2Zm13M3kzZWU4Znl4djNmZWoxemRkaHVhbSZlcD12MV9pbnRlcm5hbF9naWZfYnlfaWQmY3Q9Zw/5sNxGQsE3RXap07jQ4/giphy.webp" /> </div> | |
| 134 | |
| 135 | |
| 136 ### Different Approach: Unit Testing in Rails | |
| 137 | |
| 138 In frameworks like Ruby on Rails, a different paradigm is used for handling tests, one that might directly conflict with the traditional definition of unit tests. Instead of relying heavily on mocking and isolating functions, frameworks like Rails encourage spinning up a test database or any external dependencies that are *essential*. Here’s how the same serializer would look in Rails using Active Record: | |
| 139 | |
| 140 ```ruby | |
| 141 class Foo < ApplicationRecord | |
| 142 validates :name, presence: true, length: { maximum: 100 } | |
| 143 validates :unit_price, numericality: true | |
| 144 end | |
| 145 ``` | |
| 146 | |
| 147 And here’s how a test might look: | |
| 148 | |
| 149 ```ruby | |
| 150 require 'test_helper' | |
| 151 | |
| 152 class FooTest < ActiveSupport::TestCase | |
| 153 test "validates name length and prevents invalid record from saving" do | |
| 154 foo = Foo.new(name: "a" * 101, unit_price: 10.99, quantity_on_hand: 5) | |
| 155 | |
| 156 assert_not foo.valid?, "Foo should be invalid when name exceeds 100 characters" | |
| 157 assert_equal ["is too long (maximum is 100 characters)"], foo.errors[:name] | |
| 158 assert_not foo.save, "Foo with an invalid name should not be saved to the database" | |
| 159 assert_nil Foo.find_by(name: "a" * 101), "No Foo record with an invalid name should exist in the database" | |
| 160 end | |
| 161 | |
| 162 test "saves valid record to the database" do | |
| 163 foo = Foo.new(name: "Valid Item", unit_price: 10.99, quantity_on_hand: 5) | |
| 164 | |
| 165 assert foo.valid?, "Foo should be valid with correct attributes" | |
| 166 assert foo.save, "Foo with valid attributes should be saved to the database" | |
| 167 | |
| 168 # Look for saved foo and bar | |
| 169 saved_foo = Foo.find_by(name: "Valid Item") | |
| 170 saved_bar = Bar.find_by(name: "Valid Item") | |
| 171 | |
| 172 assert_not_nil saved_foo, "Foo record should exist in the database" | |
| 173 assert_equal 10.99, saved_foo.unit_price, "Foo's unit_price should match the saved value" | |
| 174 assert_equal 5, saved_foo.quantity_on_hand, "Foo's quantity_on_hand should match the saved value" | |
| 175 assert_equal 10.99, saved_bar.unit_price, "Bar's unit_price should match the saved value" | |
| 176 end | |
| 177 end | |
| 178 ``` | |
| 179 | |
| 180 In this example, you’re not mocking anything. Instead, you’re leveraging the test database to directly validate business logic. This approach often feels cleaner and less convoluted than mocking, especially for frameworks with tightly integrated ORM layers like Active Record. If your application also involves a Redis instance or similar components, you would create those as part of your test setup and verify their existence. This approach, however, goes strictly against AWS's definition of unit tests. At my previous job, I remember having a heated argument with an engineer over this—he was a strong believer in AWS's strict definition, while I preferred a looser interpretation based on my experience with Rails. | |
| 181 | |
| 182 **(And no, starting up a Docker container for a Redis instance or database is not SOOO slow that shouldn't be done. This is a widely accepted practice in many engineering companies, including Shopify and Airbnb.)** | |
| 183 | |
| 184 If you ask me, I’d argue that Rails' approach is slightly better because you avoid the overhead of creating numerous mock objects. For example, in Jest or Python, mocking can get out of hand—especially if you need to mock React hooks that query or mutate global states instead of initializing the state directly within the test. It’s overwhelming to see ten different mock values inserted into a single object, and now imagine having to do that for every single function you write. But now, we will have conflict with... | |
| 185 | |
| 186 ## Integration Tests | |
| 187 | |
| 188 > "A type of software test that verifies how different components of your application and services interact and work together as a system, ensuring data flows correctly between them and that the overall functionality is as expected." | |
| 189 > | |
| 190 > — Definition from AWS | |
| 191 | |
| 192 We’ve essentially already done this in our Ruby on Rails testing, so let’s revisit our `FooSerializer`. Integration tests ensure that the serializer works as expected. Here's an example of such a test: | |
| 193 | |
| 194 ```python | |
| 195 def test_serializer_create(): | |
| 196 serializer = FooSerializer(data=CORRECT_DATA) | |
| 197 | |
| 198 assert serializer.is_valid() | |
| 199 serializer.save() | |
| 200 assert Foo.objects.filter(name=CORRECT_DATA.name).exists() | |
| 201 ``` | |
| 202 | |
| 203 At this point, we’re essentially rewriting the same tests but with less mocking. So, what’s the real value of this? There has to be a scenario where unit tests are useful, and integration tests aren’t—or vice versa. In this particular case, though, it’s hard to distinguish that line. As applications grow more complex, these distinctions may become clearer. However, for most scenarios like this one, if tests weren’t written at all, it would be difficult to identify what’s missing or redundant. | |
| 204 | |
| 205 In this specific case, many would agree that writing both unit tests and integration tests offers little value. It’s almost redundant and doesn’t justify the time and effort required. | |
| 206 | |
| 207 ## E2E Tests | |
| 208 | |
| 209 I couldn't find a definition so I am going to just ask gemini for it. | |
| 210 | |
| 211 > End-to-end (E2E) testing is a software testing method that verifies how a software product works from start to finish. It's also known as system testing or broad stack testing | |
| 212 > | |
| 213 > — Gemini version that doesn't create controversial images. | |
| 214 | |
| 215 Finally, let’s talk about end-to-end (E2E) tests. The serializer function we wrote will likely live inside some API endpoint, and I don’t feel like writing codes for it so I hope you guys can imagine it. In most cases, E2E tests are run through a virtual machine (VM) or Docker container that replicates a smaller version of the real server and database, seeded with factory data. These environments are spun up during CI/CD pipelines, where tests simulate real server requests. | |
| 216 | |
| 217 E2E testing is extremely useful when your server is self-contained. However, if your application relies on third-party services, things can get tricky. For example, let’s consider a case where your application depends on an AWS service to check content safety. You can’t easily mimic that service locally, so you’d need to test against their live servers, which may incur costs for every request. As a result, you’ll likely want to separate those tests from your main CI/CD pipeline and use a dedicated staging environment to avoid mixing with production accounts. | |
| 218 | |
| 219 That’s a *good* case. A *bad* case is when the third-party service doesn’t allow any testing at all. For instance, Google’s SSO or similar services might block testing to prevent potential DDoS attacks (Because it is a DDoS attack). In these scenarios, your E2E tests often become glorified integration tests, where you simulate the service instead of interacting with the real thing. | |
| 220 | |
| 221 In short, TDD in its purest sense—using unit, integration, and E2E tests together—is often impractical or unachievable. I haven’t even touched on the time required to run E2E tests, the context-switching needed to write or debug them, flakiness, and other challenges. Still, we write tests because they are critical to delivering reliable software. After all, we’re not open-source engineers releasing products that randomly break because of dependencies like `is_number` in npm, right? Right? | |
| 222 | |
| 223 | |
| 224 ## My Thoughts | |
| 225 | |
| 226 We should primarily focus on integration and E2E tests that validate real user activity. It’s not always necessary to test the entire flow, especially when certain parts—like SSO—can only be mocked and not tested fully. E2E tests are often slow, flaky (the worst!), and sometimes impossible to implement comprehensively. | |
| 227 | |
| 228 Unit tests should be reserved for complex functions that require isolation. My general approach to testing is like a binary search: | |
| 229 | |
| 230 - If a unit test fails, the problem is always within the business logic of that specific function. | |
| 231 - If an integration test fails, the issue is likely due to mismatched input/output between components. | |
| 232 - If an E2E test fails, it could indicate a third-party service outage, a race condition, or a broader system issue. | |
| 233 | |
| 234 For simple components like our `FooSerializer`, writing three separate tests (unit, integration, and E2E) is overkill at best and a waste of time at worst. Instead, focus on testing critical user flows and isolating tests to specific areas when necessary. This strikes a balance between test coverage and productivity, avoiding the trap of excessive and redundant testing. |