430: Test Suite Pain & Anti-Patterns

The Bike Shed

Sisällön tarjoaa thoughtbot. thoughtbot tai sen podcast-alustan kumppani lataa ja toimittaa kaiken podcast-sisällön, mukaan lukien jaksot, grafiikat ja podcast-kuvaukset. Jos uskot jonkun käyttävän tekijänoikeudella suojattua teostasi ilman lupaasi, voit seurata tässä https://fi.player.fm/legal kuvattua prosessia.

5M ago 40:57

MP3•Jakson koti

Stephanie and Joël discuss the recent announcement of the call for proposals for RubyConf in November. Joël is working on his proposals and encouraging his colleagues at thoughtbot to participate, while Stephanie is excited about the conference being held in her hometown of Chicago!

The conversation shifts to Stephanie's recent work, including completing a significant client project and her upcoming two-week refactoring assignment. She shares her enthusiasm for refactoring code to improve its structure and stability, even when it's not her own. Joël and Stephanie also discuss the everyday challenges of maintaining a test suite, such as slowness, flakiness, and excessive database requests. They discuss strategies to balance the test pyramid and adequately test critical paths.

Finally, Joël emphasizes the importance of separating side effects from business logic to enhance testability and reduce complexity, and Stephanie highlights the need to address testing pain points and ensure tests add real value to the codebase.

Transcript:

STEPHANIE: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Stephanie Minn.

JOËL: And I'm Joël Quenneville. And together, we're here to share a bit of what we've learned along the way.

STEPHANIE: So, Joël, what's new in your world?

JOËL: Something that's new in my world is that RubyConf just announced their call for proposals for RubyConf in November. They're open for...we're currently recording in June, and it's open through early July, and they're asking people everywhere to submit talk ideas. I have a few of my own that I'm working with. And then, I'm also trying to mobilize a lot of other colleagues at thoughtbot to get excited to submit.

STEPHANIE: Yes, I am personally very excited about this year's RubyConf in November because it's in Chicago, where I live, so I have very little of an excuse not to go [laughs]. I feel like so much of my conference experience is traveling to just kind of, like, other cities in the U.S. that I want to spend some time in and, you know, seeing all of my friends from...my long-distance friends. And it definitely does feel like just a bit of an immersive week, right? And so, I wonder how weird it will feel to be going to this conference and then going home at the end of the night. Yeah, that's just something that I'm a bit curious about. So, yeah, I mean, I am very excited. I hope everyone comes to Chicago. It's a great city.

JOËL: I think the pitch that I'm hearing is submit a proposal to the RubyConf CFP to get a chance to get a free ticket to go to RubyConf, where you get to meet Bike Shed co-host Stephanie Minn.

STEPHANIE: Yes. Ruby Central should hire me to market this conference [laughter] and that being the main value add of going [laughs], obviously. Jokes aside, I'm excited for you to be doing this initiative again because it was so successful for RailsConf kind of internally at thoughtbot. I think a lot of people submitted proposals for the first time with some of the programming you put on. Are you thinking about doing things any differently from last time, or any new thoughts about this conference cycle?

JOËL: I think I'm iterating on what we did last time but trying to keep more or less the same formula. Among other things, people don't always have ideas immediately of what they want to speak about. And so, I have a brainstorming session where we're just going to get together and brainstorm a bunch of topics that are free for anyone to take. And then, either someone can grab one of those topics and pitch a talk on it, or it can be, like, inspiration where they see that it jogs their mind, and they have an idea that then they go off and write a proposal.

And so, that allows, I think, a lot of colleagues as well, who are maybe not interested in speaking but might have a lot of great ideas, to participate and sort of really get a lot of that energy going. And then, from there, people who are excited to speak about something can go on to maybe draft a proposal. And then, I've got a couple of other events where we support people in drafting a proposal and reviewing and submitting, things like that.

STEPHANIE: Yes, I really love how you're just involving people with, you know, just different skills and interests to be able to support each other, even if, you know, there's someone on our team who's, like, not interested in speaking at all, but they're, like, an ideas person, right? And they would love to see their idea come to life in a talk from someone else. Like, I think that's really cool, and I certainly appreciate it as a not ideas person [laughs].

JOËL: Also, I want to shout out that Ruby Central is doing CFP coaching sessions on June 24th, June 25th, and June 26th, and those are open to anyone. You can sign up. We'll put a link to the signup form in the show notes. If you've never submitted something before and you'd like some tips on what makes for a good CFP, how can you up your chances of getting accepted, or maybe you've submitted before, you just want to get better at it; I recommend joining one of those slots. So, Stephanie, what's new in your world?

STEPHANIE: So, I just successfully delivered a big project on my client work last week. So, I'm kind of riding that wave and getting into the next bit of work that I have been assigned for this team, and I'm really excited to do this. But I also, I don't know, I've been just, like, thinking about it quite a bit. Basically, I'm getting to spend two dedicated weeks to just refactoring [laughs] some really, I guess, complicated code that has led to some bugs recently and just needing some love, especially because there's some whiffs of potentially, like, really investing in this area of the product, and people wanting to make sure that the foundation does feel very stable to build on top of for extending and changing that code.

And I think I, like, surprised myself by how excited I was to do this because it's not even code I wrote. You know, sometimes when you are the one who wrote code, you're like, oh, like, I would love time to just go back and clean up all these things that I kind of missed the first time around or just couldn't attend to for whatever reason. But yeah, I think I was just a little bit in the peripheries of that code, and I was like, oh, like, just seeing some weird stuff. And now to kind of have the time to be like, oh, this is all I'm going to be doing for two weeks, to, like, really dive into it and get my hands dirty [laughs], I'm very excited.

JOËL: I think that refactoring is a thing that can be really fun. And also, when you have a larger chunk of time, like two days, it's easy to sort of get lost in sort of grand visions or projects. How do you kind of balance the, I want to do a lot of refactoring; I want to take on some bigger things while maybe trying to keep some focus or have some prioritization?

STEPHANIE: Yeah, that's a great question. I was actually the one who said, like, "I want two weeks on this." And it also helped that, like, there was already some thoughts about, like, where they wanted to go with this area of the codebase and maybe what future features they were thinking about. And there are also a few bugs that I am fixing kind of related to this domain. So, I think that is actually what I started with.

And that was really helpful in just kind of orienting myself in, like, the higher impact areas and the places that the pain is felt and exploring there first to, like, get a sense of what is going on here. Because I think that information gathering is really important to be able to kind of start changing the code towards what it wants to be and what other devs want it to be.

I actually also started a thread in Slack for my team. I was, like, asking for input on what's the most confusing or, like, hard to reason about files or areas in this particular domain or feature set and got a lot of really good engagement. I was pleasantly surprised [laughs], you know, because sometimes you, like, ask for feedback and just crickets. But I think, for me, it was very affirming that I was, like, exploring something that a lot of people are like, oh, we would love for someone to, you know, have just time to get into this. And they all were really excited for me, too. So, that was pretty cool.

JOËL: Interesting. So, it sounds like you sort of budgeted some refactoring time and then, from there, broke it down into a series of a couple of debugging projects and then a couple of, like, more bounded refactoring projects, where, like, specifically, I want to restructure the way this object works or something like that.

STEPHANIE: Yeah. I think there was that feeling of wanting to clean up this area of the codebase, but you kind of caught on to that bit of, you know, it can go so many different ways. And, like, how do you balance your grand visions [laughs] of things with, I guess, a little bit of pragmatism? So, it was very much like, here's all these bugs that are causing our customers problems that are kind of, like, hard for the devs to troubleshoot. You know, that kind of prompts the question, like, why?

And so, if there can be, you know, the fixing of the bugs, and then the learning of, like, how that part of the system works, and then, hopefully, some improvements along the way, yeah, that just felt like a dream [laughs] for me. And two weeks felt about the right amount of time. I don't know if anyone kind of hears that and feels like it's too long or too little. I would be really curious. But I feel like it is complex enough that, like, context switching would, I think, make this work harder, and you kind of do have to just sit with it for a little bit to get your bearings.

JOËL: A scenario that we encounter on a pretty regular basis is a customer coming to us and telling us that they're feeling a lot of test pain and asking what are the ways that we can help them to make things better and that test pain can come under a lot of forms.

It might be a test suite that's really slow and that's hurting the team in terms of their velocity. It might be a test suite that is really flaky. It might be one that is really difficult to add to, or maybe one that has very low coverage, or one that is just really brittle. Anytime you try to make a change to it, a bunch of things break, and it takes forever to ship anything. So, there's a lot of different aspects of challenging test suites that clients come to us with.

I'm curious, Stephanie, what are some of the ones that you've encountered most frequently?

STEPHANIE: I definitely think that a slow test suite and a flaky test suite end up going hand in hand a lot, or just a brittle one, right? That is slowing down development and, like you said, causing a lot of pain. I think even if that's not something that a client is coming to us directly about, it maybe gets, like, surfaced a little bit, you know, sometime into the engagement as something that I like to keep an eye on as a consultant. And I actually think, yeah, that's one of kind of the coolest things, I think, about our consulting work is just getting to see so many different test suites [laughs]. I don't know. I'm a testing nerd, so I love that kind of stuff.

And then, I think you were also kind of touching on this idea of, like, maintaining a test suite and, yeah, making testing just a better experience. I have a theory [laughs], and I'd be curious to get your thoughts on it. But one thing that I really struggle with in the industry is when people talk about writing tests as if it's, like, the morally superior thing to do. And I struggle with this because I don't think that it is a very good strategy for helping people feel better or more confident and, like, upskill at writing tests.

I think it kind of shames people a little bit who maybe either just haven't gotten that experience or, you know, just like, yeah, like, for whatever reason, are still learning how to do this thing. And then, I think that mindset leads to bad tests [laughs] or tests that aren't really serving the purpose that you hope they would because people are doing it more out of obligation rather than because they truly, like, feel like it adds something to their work. Okay, I kind of just dropped that on you [laughs]. Do you have any reactions?

JOËL: Yeah, I guess the idea that you're just checking a box with your test rather than writing code that adds value to the codebase. They're two very different perspectives that, in the end, will generate more lines of code if you're just doing a checkbox but may or may not add a whole lot of value. So, maybe before even looking at actual, like, test practices, it's worth stepping back and asking more of a mindset question: Why does your team test? What is the value that your team feels they get out of testing?

STEPHANIE: Yeah. Yeah. I like that because I was about to say they go hand in hand. But I do think that maybe there is some, you know, question asking [laughs] to be done because I do think people like to kind of talk about the testing practices before they've really considered that. And I am, like, pretty certain from just kind of, at least what I've seen, and what I've heard, and what I've experienced on embedding into client teams, that if your team can't answer that question of, like, "What value does testing bring?" then they probably aren't following good testing practices [laughs]. Because I do think you kind of need to approach it from a perspective of like, okay, like, I want to do this because it helps me, and it helps my team, rather than, like you said, getting the check mark.

JOËL: So, once we've sort of established maybe a bit of a mindset or we've had a conversation with the team around what value they think they're getting out of tests, or maybe even you might need to sell the team a little bit on like, "Hey, here's, like, all these different ways that testing can bring value into your life, make your life as developers easier," but once you've done that sort of pre-work and you can start looking at what's actually the problem with a test suite, a common complaint from developers is that the test suite is too slow. How do you like to approach a slow test suite?

STEPHANIE: That's a good question. I actually...I think there's a lot of ways to answer that. But to kind of stay on the theme of stepping back a little bit, I wonder if assessing how well your test suite aligns with the testing pyramid would be a good place to start; at least, that could be where I start if I'm coming into a client team for the first time, right, and being asked to start assessing or just poking around. Because I think the slowness a lot of the time comes from a lot of quote, unquote, "integration tests" or, like, unit tests masquerading as integration tests, where you end up having, like, a lot of duplication of things that are being tested in ways that are integrating with some slow parts of the system like the database.

And yeah, I think even before getting into some of the more discreet reasons why you might be writing slow tests, just looking at the structure of your test suite and what kinds of things you're testing, and, again, even going back to your team and asking, like, "What kinds of things do you test?" Or like, "Do you try to test or wish to be testing more of, less of?" Like looking at the structure, I have found to be a good place to start.

JOËL: And for those who are not familiar, you used the term testing pyramid. This is a concept which says that you probably want to have a lot of small, fast unit tests, a medium amount of integration tests that test a few different components together, and then a few end-to-end tests. Because as you go up that pyramid, tests become more expensive. They take a lot longer to run, whereas the little unit tests are super cheap. You can create thousands of them, and they will barely impact your run time. Adding a dozen end-to-end tests is going to be noticeable. So, you want to balance sort of the coverage that you get from end to end with the sort of cheapness and ubiquity of the little unit tests, and then split the difference for tests that are in between.

STEPHANIE: And I think that is challenging, even, you know, you're talking about how you want the peak of your pyramid to be end-to-end tests. So, you don't want a lot of them, but you do want some of them to really ensure that things are totally plumbed and working correctly. But that does require, I think, really looking at your application and kind of identifying what features are the most critical to it. And I think that doesn't get paid enough attention, at least from a lot of my client experiences. Like, sometimes teams just end up with a lot of feature bloat and can't say like, you know, they say, "Everything's important [chuckles]," but everything can't be equally important, you know?

JOËL: Right. I often like to develop using a sort of outside-in approach, where you start by writing an end-to-end test that describes the behavior that your new feature ticket is asking for and use that to drive the work that I'm doing. And that might lead to some lower-level unit tests as I'm building out different components, but the sort of high-level behavior that we're adding is driven by adding an end-to-end spec.

Do you feel that having one new end-to-end spec for every new feature ticket that you work on is a reasonable thing to do, or do you kind of pick and choose? Do you write some, but maybe start, like, coalescing or culling them, or something like that? How do you manage that idea that maybe you would or would not want one end-to-end spec for each feature ticket?

STEPHANIE: Yeah, it's a good question. Actually, as you were saying that, I was about to ask you, do you delete some afterwards [laughs]? Because I think that might be what I do sometimes, especially if I'm testing, you know, edge cases or writing, like, the end-to-end test for error states. Sometimes, not all of them make it into my, like, final, you know, commit. But they, you know, had their value, right? And at least it prompted me to make sure I thought about them and make sure that they were good error states, right? Like things that had visible UI to the user about what was going on in case of an error. So, I would say I will go back and kind of coalesce some of them, but they at least give me a place to start. Does that match your experience?

JOËL: Yeah, I tend to mostly write end-to-end tests for happy paths and then write kind of mid-level things to cover some of my edge cases, maybe a couple of end-to-end tests for particularly critical paths. But, at some point, there's just too many paths through the app that you can't have end-to-end coverage for every single branch on every single path that can happen.

STEPHANIE: Yeah, I like that because if you find yourself having a lot of different conditions that you need to test in an end-to-end situation, maybe there's room for that to, like, be better encapsulated in that, like, more, like, middle layer or, I don't know, even starting to ask questions about, like, does this make sense with the product? Like, having all of these different things going on, does that line up with kind of the vision of what this feature is trying to be or should be? Because I do think the complexity can start at that high of a level.

JOËL: How do you feel about the idea that adding more end-to-end tests, at some point, has diminishing returns?

STEPHANIE: I'm not quite sure I'm following [laughs].

JOËL: So, let's say you have an end-to-end test for the happy path for every core feature of the app. And you decide, you know what, I want to add maybe some, like, side features in, or maybe I want to have more error states. And you start, like, filling in more end-to-end tests for those. Is it fair to say that adding some of those is a bit of a diminishing return? Like, you're not getting as much value as you would from the original specs. And maybe as you keep finding more and more rare edge cases, you get less and less value for your test.

STEPHANIE: Oh, yeah, I see. And there's more of a cost, too, right? The cost of the time to run, maintain, whatever.

JOËL: Right. Let's say they're roughly all equally expensive in terms of cost to run. But as you stray further and further off of that happy path, you're getting less and less value from that integration test or that end-to-end test.

STEPHANIE: I'm actually a little conflicted about this because that sounds right in theory, but then in practice, I feel like I've seen error states not get enough love [laughs] that it's...I don't even want to say, like, you make any kind of claim [laughs] about it. But, you know, if you're going to start somewhere, if you have, like, a limited amount of time and you're like, okay, I'm only going to write a handful of end-to-end tests, yeah, like, write tests for your happy paths [laughs].

JOËL: I guess it's probably fair to say that error states just don't get as much love as they should throughout the entire testing stack: at the unit level, at the integration level, all the way up to end to end.

STEPHANIE: I'm curious if you were trying to get at some kind of conclusion, though, with the idea of diminishing returns.

JOËL: I guess I'm wondering if, from there, we can talk about maybe a breakdown of a particular testing pyramid for a particular test suite is being top heavy, and whether there's value in maybe pushing some of these tests, some of these edge cases, some of these maybe less important features down from that, like, top end-to-end layer into maybe more of an integration layer. So, in a Rails context, that might be moving system specs down to something like a request spec.

STEPHANIE: Yeah, I think that is what I tend to do. I'm trying to think of how I get there, and I'm not quite sure that I can explain it quite yet. Yeah, I don't know. Do you think you can help me out here? Like, how do you know it's time to start writing more tests for your unhappy paths lower on the pyramid?

JOËL: Ideally, I think a lot of your code should be unit-tested. And when you are unit testing it, those pieces all need coverage of the happy and unhappy paths. I think the way it may often happen naturally is if you're pushing logic out of your controllers because it's a little bit challenging sometimes to test Rails controllers.

And so, if you're moving things into domain objects, even service objects, depending on how you implement them, just doing that and then making sure you unit test them can give you a lot more coverage of all the different edge cases that can happen. Where things sometimes fall apart is getting out of that business layer into the web layer and saying, "Hey, if something raises an error or if the save fails or something like that, does the user get a good experience, or do we just crash and give them a 500 page?"

STEPHANIE: Yeah, that matches with a lot of what I've seen, where if you then spend too much time in that business layer and only handling errors there, you don't really think too much about how it bubbles up. And, you know, then you are digging through, like, your error monitoring [laughs] service, trying to find out what happened so that you can tell, you know, your customer support team [laughs] to help them resolve, like, a bug report they got.

But I actually think...and you were talking about outside in, but, in some ways, in my experience, I also get feedback from the bottom up sometimes that then ends up helping me adjust some of those integration or end-to-end tests about kind of what errors are possible, like, down in the depths of the code [laughs], and then finding ways to, you know, abstract that or, like, kind of be like, "Oh, like, here are all these possible, like, exceptions that might be raised." Like, what HTTP status code do I want to be returned to capture all of these things? And what do I want to say to the user? So, yeah, I'm [laughs] kind of a little lost myself, but this idea that going both, you know, outside in and then maybe even going back up a little bit has served me well before.

JOËL: I think there can be a lot of value in sort of dropping down a level in the pyramid, and maybe instead of doing sort of end-to-end tests where you, like, trigger a scenario where something fails, you can just write a request back against the controller and say, "Hey, if I go to this controller and something raises an error, expect that you get redirected to this other location." And that's really cheap to run compared to an end-to-end test. And so, I think that, for me, is often the right compromise is handling error states at sort of the next lowest level and also in slightly more atomic pieces. So, more like, if you hit this endpoint and things go wrong, here's how things happen.

And I use endpoint not so much in an API sense, although it could be, but just your, you know, maybe you've got a flow that's multiple steps where, you know, you can do a bunch of things. But I might have a test just for one controller action to say, "Hey, if things go wrong, it redirects you here, or it shows you this error page." Whereas the end-to-end test might say, "Oh, you're going to go through the entire flow that hits multiple different controllers, and the happy path is this nice chain." But each of the exit points off at where things fail would be covered by a more scoped request spec on that controller.

STEPHANIE: Yeah. Yeah. That makes sense. I like that.

JOËL: So, that's kind of how I've attempted to balance my pyramid in a way that balances complexity and time with coverage. You mentioned that another area that test suites get slow is making too many requests to the database. There's a lot of ways that that happens. Oftentimes, I think a classic is using a factory where you really don't need to, persisting data to the database when all you needed was some object in memory. So, there are different strategies for avoiding that.

It's also easy to be creating too much data. So, maybe you do need to persist some things in the database, but you're persisting a hundred objects into memory or into the database when you really meant to persist two, so that's an easy accident. A couple of years ago, I gave a talk at RailsConf titled "Your Test Suite is Making Too Many Database Requests" that went over a bunch of different ways that you can be doing a lot of expensive database requests you didn't plan on making and how that slows down your test suite. So, that is also another hot spot that I like to look at when dealing with a slow test suite.

STEPHANIE: Yeah, I mentioned earlier the idea of unit tests really masquerading as integration tests [laughs]. And I think that happens especially if you're starting with a class that may already be a little bit too big than it should be or have more responsibilities than it should be. And then, you are, like, either just, like, starting with using the create build, like, strategy with factories, or you find yourself, like, not being able to fully run the code path you're trying to test without having stuff persisted.

Those are all, I think, like, test smells that, you know, are signaling a little bit of a testing anti-pattern that, yeah, like, is there a way to write, like, true unit tests for this stuff where you are only using objects in memory? And does that require breaking out some responsibilities? That is a lot of what I am kind of going through right now, actually, with my little refractoring project [laughs] is backfilling some tests, finding that I have to create a lot of records.

And you know what? Like, the first step will probably be to write those tests and commit them, and just have them live there for a little while while I figure out, you know, the right places to start breaking things up, and that's okay. But yeah, I did want to, like, just mention that if you are having to create a lot of records and then also noticing, like, your test is running kind of slow [laughs], that could be a good indicator to just give a good, hard look at what kind of style of test you think you're writing [laughs].

JOËL: Yeah, your tests speak to you, and when you're feeling pain, oftentimes, it can be a sign that you should consider refactoring your implementation. And I think that's doubly true if you're writing tests after the fact rather than test driving. Because sometimes you sort of...you came up with an implementation that you thought would be good, and then you're writing tests for it, and it's really painful. And that might be telling you something about the underlying implementation that you have that maybe it's...you thought it's well scoped, but maybe it actually has more responsibilities than you initially realized, or maybe it's just really tightly coupled in a way that you didn't realize. And so, learning to listen to your tests and not just sort of accepting the world for being the way it is, but being like, "No, I can make it better."

STEPHANIE: Yeah, I've been really curious why people have a hard time, like, recognizing that pain sometimes, or maybe believing that this is the way it is and that there's not a whole lot that you can do about it. But it's not true, like, testing really does not have to be painful. And I feel like, again, this is one of those things that's like, it's hard to believe until you really experience it, at least, that was the case for me.

But if you're having a hard time with tests, it's not because you're not smart enough. Like, that, I think, is a thing that I really want to debunk right now [laughs] for anyone who has ever had that thought cross their mind. Yeah, things are just complicated and complex somehow, or software entropy happens. That's, like, not how it should be, and we don't have to accept that [laughs]. So, I really like what you said about, oh, you can change it. And, you know, that is a bit of a callback to the whole mindset of testing that we mentioned earlier at the beginning.

JOËL: Speaking of test suites, we have not covered yet is paralyzing it. That could probably be its own Bike Shed episode on its own entirely on paralyzing a test suite. We've done entire engagements where our job was to come in and help paralyze a test suite, make it faster. And there's a lot of, like, pros and cons. So, I think maybe we can save that for a different episode. And, instead, I'd like to quickly jump in a little bit to some other common pain points of test suites, and I would say probably top of that list is test flakiness. How do you tend to approach flakiness in a client project?

STEPHANIE: I am, like, laughing to myself a little bit because I know that I was dealing with test flakiness on my last client engagement, and that was, like, such a huge part of my day-to-day is, like, hitting that retry button. And now that I am on a project with, like, relatively low flakiness, I just haven't thought about it at all [laughs], which is such a privilege, I think [laughs].

But one of the first things to do is just start, like, capturing metrics around it. If you, you know, are hearing about flakiness or seeing that, like, start to plague your test suite or just, you know, cropping up in different ways, I have found it really useful to start, like, I don't know, just, like, maybe putting some of that information in a dashboard and seeing how, just to, like, even make sure that you are making improvements, that things are changing, and seeing if there's any, like, patterns around what's causing the flakiness because there are so many different causes of it.

And I think it is pretty important to figure out, like, what kind of code you're writing or just trying to wrangle. That's, you know, maybe more likely to crop up as flakiness for your particular domain or application. Yeah, I'm going to stop there and see, like, because I know you have a lot of thoughts about flakiness [laughs].

JOËL: I mean, you mentioned that there's a lot of different causes for flakiness. And I think, in my experience, they often sort of group into, let's say, like, three different buckets. Anytime you're testing code that's doing things that are non-deterministic, that's easy for tests to be flaky. And so, you might think, oh, well, you know, you have something that makes a call to random, and then you're going to assert on a particular outcome of that. Well, clearly, that's going to not be the same every time, and that might be flaky.

But there are, like, more subtle versions of that, so maybe you're relying on the system clock in some way. And, you know, depending on the time you run that test, it might give you a different value than you expect, and that might cause it to fail. And it doesn't have to be you're asserting on, like, oh, specifically a given millisecond. You might be doing math around, like, number of days, and when you get near to, let's say, the daylight savings boundary, all of a sudden, no, you're off by an hour, and your number of days...calculation breaks because relying on the clock is something that is inherently non-deterministic. Non-determinism is a bucket.

Leaky tests is another bucket of failures that I see, things where one test might impact another that gets run after the fact, oftentimes by mutating some sort of global state. So, maybe you're both relying on some sort of, like, external file that you're both writing to or maybe a cache that one is writing to that the other one is reading from, something like that. It could even just be writing records into the database in a way that's not wrapped in a transaction, such that there's more data in the database when the next test runs than it expects.

And then, finally, if you are doing any form of parallelization, that can improve your test suite speed, but it also potentially leads to race conditions, where if your resources aren't entirely isolated between parallel test runners, maybe you're sharing a database, maybe you're sharing Redis instance or whatever, then you can run into situations where you're both kind of fighting over the same resources or overriding each other's data, or things like that, in a way that can cause tests to fail intermittently. And I think having a framework like that of categorization can then help you think about potential solutions because debugging approaches and then solutions tend to be a little bit different for each of these buckets.

STEPHANIE: Yeah, the buckets of different causes of flaky tests you were talking about, I think, also reminded me that, you know, some flakiness is caused by, like, your testing environment and your infrastructure. And other kinds of flakiness are maybe caused more from just the way that you've decided how your code should work, especially that, like, non-deterministic bucket. So, yeah, I don't know, that was just, like, something that I noticed as you were going through the different categories. And yeah, like, certainly, the solutions for approaching each kind are very different.

JOËL: I would like to pitch a talk from RubyConf last year called "The Secret Ingredient: How To Resolve And Understand Just About Any Flaky Test" by Alan Ridlehoover. Just really excellent walkthrough of these different buckets and common debugging and solving approaches to each of them. And I think having that framework in mind is just a great way to approach different types of flaky tests.

STEPHANIE: Yes, I'll plus one that talk, lots of great pictures of delicious croissants as well.

JOËL: Very flaky pastry.

STEPHANIE: [laughs] Joël, do you have any last testing anti-pattern guidances for our audience who might be feeling some test pain out there?

JOËL: A quick list, I'm going to say tight coupling that has then led to having a lot of stubbing in your tests often leads to tests that are very brittle, so tests that maybe don't fail when they should when you've actually broken things, or maybe, alternatively, tests that are constantly failing for the wrong reasons. And so, that is a thing that you can fix by making your code less coupled.

Tests that also require stubbing a lot of things because you do a lot of side effects. If you are making a lot of HTTP calls or things like that, that can both make a test more complex because it has to be aware of that. But also, it can make it more non-deterministic, more flaky, and it can just make it harder to change. And so, I have found that separating side effects from sort of business logic is often a great way to make your test suite much easier to work with.

I have a blog post on that that I'll link in the show notes. And I think this maybe also approaches the idea of a functional core and an imperative shell, which I believe was an idea pitched by Gary Bernhardt, like, over ten years ago. There's a famous video on that that we'll also link in the show notes. But that architecture for building an app can lead to a much nicer test to write. I guess the general idea being that testing code that does side effects is complicated and painful. Testing code that is more functional tends to be much more pleasant. And so, by not intermingling the two, you tend to get nicer tests that are easier to maintain.

STEPHANIE: That's really interesting. I've not heard that guidance before, but now I am intrigued. That reminded me of another thing that I had a conversation with someone about. Because after the RailsConf talk I gave, which was about testing pain, there was some stubbing involved in the examples that I was showing because I just see a lot of that stuff. And, you know, this audience member kind of had that question of, like, "How do you know that things are working correctly if you have to stub all this stuff out?"

And, you know, sometimes you just have to for the time being [chuckles]. And I wanted to just kind of call back to that idea of having those end-to-end tests testing your critical paths to at least make sure that those things work together in the happy way. Because I have seen, especially with apps that have a lot of service objects, for some reason, those being kind of the highest-level test sometimes. But oftentimes, they end up not being composed well, being quite coupled with other service objects. So, you end up with a lot of stubbing of those in your test for them. And I think that's kind of where you can see things start to break down.

JOËL: Yep. And when the RailsConf videos come out, I recommend seeing Stephanie's talk, some great gems in there for building a more maintainable test suite. Stephanie and I and, you know, most of us here at thoughtbot, we're testing nerds. We think about this a lot. We've also written a lot about this. There are a lot of resources in the show notes for this episode. Check them out. Also, just generally, check out the testing tag on the thoughtbot blog. There is a ton of content there that is worth looking into if you want to dig further into this topic.

STEPHANIE: Yeah, and if you are wanting some, like, dedicated, customized testing training, thoughtbot offers an RSpec workshop that's tailored to your team. And if you kind of are interested in the things we're sharing, we can definitely bring that to your company as well.

JOËL: On that note, shall we wrap up?

STEPHANIE: Let's wrap up. Show notes for this episode can be found at bikeshed.fm.

JOËL: This show has been produced and edited by Mandy Moore.

STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show.

JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter.

STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email.

JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week.

ALL: Byeeeeeeeee!!!!!!!!

AD:

Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us.

More info on our website at: tbot.io/referral. Or you can email us at: referrals@thoughtbot.com with any questions.

Support The Bike Shed

446 jaksoa

#Listener Questions