{"id":3024,"date":"2019-05-16T09:49:47","date_gmt":"2019-05-16T14:49:47","guid":{"rendered":"https:\/\/mikeconley.ca\/blog\/?p=3024"},"modified":"2023-12-20T16:25:09","modified_gmt":"2023-12-20T21:25:09","slug":"a-few-words-on-main-thread-disk-access-for-general-audiences","status":"publish","type":"post","link":"https:\/\/mikeconley.ca\/blog\/2019\/05\/16\/a-few-words-on-main-thread-disk-access-for-general-audiences\/","title":{"rendered":"A few words on main thread disk access for general audiences"},"content":{"rendered":"\n<p><em>I&#8217;m writing this in lieu of a traditional Firefox Front-end Performance Update, as I think this will be more useful in the long run than just a snapshot of what my team is doing.<\/em><\/p>\n\n\n\n<p>I want to talk about main thread disk access (sometimes referred to more generally as \u201cmain thread IO\u201d). Specifically, I\u2019m going to argue that <em>main thread disk access is lethal to program responsiveness<\/em>. For some folks reading this, that might be an obvious argument not worth making, or one already made ad nauseam \u2014 if that\u2019s you, this blog post is probably not for you. You can go ahead and skip most or all of it, if you\u2019d like. Or just skim it. You never know \u2014 there might be something in here you didn\u2019t know or hadn\u2019t thought about!<br \/><\/p>\n\n\n\n<p>For everybody else, scoot your chairs forward, grab a snack, and read on.<br \/><\/p>\n\n\n\n<p><strong>Disclaimer<\/strong>: I wouldn\u2019t call myself a disk specialist. I don\u2019t work for Western Digital or Seagate. I don\u2019t design file systems. I have, however, been using and writing software for computers for a significant chunk of my life, and I seem to have accumulated a bunch of information about disks. Some of that information might be incorrect or imprecise. Please send me mail at mike dot d dot conley at gmail dot com if any of this strikes you as wildly inaccurate (though forgive me if I politely disregard pedantry), and then I can update the post.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The mechanical parts of a computer<\/h2>\n\n\n\n<p>If you grab a screwdriver and (carefully) open up a laptop or desktop computer, what do you see? Circuit boards, chips, wires and plugs. Lots of electrons flowing around in there, moving quickly and invisibly.<br \/><\/p>\n\n\n\n<p>Notably, there aren\u2019t many <em>mechanical<\/em> moving parts of a modern computer. Nothing to grease up, nowhere to pour lubricant. Opening up my desktop at home, the only moving parts I can really see are the cooling fans near the CPU and power supply (and if you\u2019re like me, you\u2019ll also notice that your cooling fans are caked with dust and in need of a cleaning).<br \/><\/p>\n\n\n\n<p>There\u2019s another moving part that\u2019s harder to see \u2014 the hard drive. This might not be obvious, because most mechanical drives (I\u2019ve heard them sometimes referred to as magnetic drives, spinny drives, physical drives, platter drives and HDDs. There are probably more terms.) hide their moving parts inside of the disk enclosure.<sup id=\"rf1-3024\"><a href=\"#fn1-3024\" title=\"There are also newer forms of disks called &lt;a href=&quot;https:\/\/en.wikipedia.org\/wiki\/Flash_drive&quot;&gt;Flash disks&lt;\/a&gt; and &lt;a href=&quot;https:\/\/en.wikipedia.org\/wiki\/Solid-state_drive&quot;&gt;SSDs&lt;\/a&gt;. I\u2019m not really going to cover those in this post.\" rel=\"footnote\">1<\/a><\/sup><br \/><\/p>\n\n\n\n<p>If you ever get the opportunity to open one of these enclosures (perhaps the disk has failed or is otherwise being replaced, and you\u2019re just about to get rid of it) I encourage you to.<br \/><\/p>\n\n\n\n<p>As you disassemble the drive, what you\u2019ll probably notice are circular parts, layered on top of one another on a motor that spins them. In between those circles are little arms that can move back and forth. This next image shows one of those circles, and one of those little arms.<br \/><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/hscBixhNoeuqI--tT-ieg80dPCRZ6lDPpRhAYvCnl8jA7EUeUU_xZpl5pZWiBJb7a3Qw1IUvGXDi5pJ9YrMPQrDircnFXftcddfyLpdp4V1CDfZPv8sO65EI-j-2mcApWF7InQXD\" alt=\"\"\/><figcaption>There are several of those circles stacked on top of one another, and several of those arms in between them. We&#8217;re only seeing the top one in this photo.<\/figcaption><\/figure>\n\n\n\n<p>Does this remind you of anything? The circular parts remind me of CDs and DVDs, but the arms reaching across them remind me of vinyl players.<br \/><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/egycQDCIUt49H0qqBC0iRWmwWFtC2fVeCBNc7cF2tODEOcQH7B-7rYJwlEDZlSkx9zEz86ogY2WWwCO7KJCA0Blc55-Sg49H5tc6I_83PP04gUSxrLJjkueSQjTjIF1BKSNNwEpZ\" alt=\"\"\/><figcaption>Vinyl&#8217;s back, baby!<\/figcaption><\/figure>\n\n\n\n<p>The comparison isn\u2019t that outlandish. If you ignore some of the lower-level details, CDs, DVDs, vinyl players and hard drives all operate under the same basic principles:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>The circular part has information encoded on it.<\/li><li>An arm of some kind is able to reach across the radius of the circular part.<\/li><li>Because the circular part is spinning, the arm is able to reach all parts of it.<\/li><li>The end of the arm is used to read the information encoded on it.<\/li><\/ol>\n\n\n\n<p>There\u2019s some extra complexity for hard drives. Normally there\u2019s more than one spinning platter and one arm, all stacked up, so it\u2019s more like several vinyl players piled on top of one another.<br \/><\/p>\n\n\n\n<p>Hard drives are also typically written to as well as read from, whereas CDs, DVDs and vinyls tend to be written to once, and then used as \u201cread-only memory.\u201d (Though, yes, there are exceptions there.)<br \/><\/p>\n\n\n\n<p>Lastly, for hard drives, there\u2019s a bit I\u2019m skipping over involving caches, where parts of the information encoded on the spinning platters are temporarily held elsewhere for easier and faster access, but we\u2019ll ignore that for now for simplicity, and because it wrecks my simile.<sup id=\"rf2-3024\"><a href=\"#fn2-3024\" title=\"The other thing to keep in mind is that the disk cache can have its contents evicted at any time for reasons that are out of your control. If you time it right, you can maybe increase the probability of a file you want to read being in the cache, but don\u2019t bet the farm on it.\" rel=\"footnote\">2<\/a><\/sup><br \/><\/p>\n\n\n\n<p>So, in general, when you\u2019re asking a computer to read a file off of your hard drive, it\u2019s a bit like asking it to play a tune on a vinyl. It needs to find the right starting place to put the needle, then it needs to put the needle there and only then will the song play.<br \/><\/p>\n\n\n\n<p>For hard drives, the act of moving the \u201carm\u201d to find the right spot is called <em>seeking<\/em>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Contiguous blocks of information and fragmentation<\/h2>\n\n\n\n<p>Have you ever had to defragment your hard drive? What does that even mean? I\u2019m going to spend a few moments trying to explain that at a high-level. Again, if this is something you already understand, go ahead and skip this part.<br \/><\/p>\n\n\n\n<p>Most functional hard drives allow you to do the following useful operations:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Write data to the drive<\/li><li>Read data from the drive<\/li><li>Remove data from the drive<\/li><\/ol>\n\n\n\n<p>That last one is interesting, because usually when you delete a file from your computer, the information isn\u2019t actually erased from the disk. This is true even after emptying your Trash \/ Recycling Bin \u2014 perhaps surprisingly, the files that you asked to be removed are still there encoded on the circular platters as 1\u2019s and 0\u2019s. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Data_recovery\">This is why it\u2019s sometimes possible to recover deleted files even when it seems that all is lost<\/a>.<br \/><\/p>\n\n\n\n<p>Allow me to explain.<br \/><\/p>\n\n\n\n<p>Just like there are different ways of organizing a sock drawer (at random, by colour, by type, by age, by amount of damage), there are ways of organizing a hard drive. These \u201cways\u201d are called file systems. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Comparison_of_file_systems\">There are lots of different file systems<\/a>. If you\u2019re using a modern version of Windows, you\u2019re probably using a file system called <a href=\"https:\/\/en.wikipedia.org\/wiki\/NTFS\">NTFS<\/a>. One of the things that a file system is responsible for is <em>knowing where your files are on the spinning platters<\/em>. This file system is also responsible for knowing <em>where there\u2019s free space on the spinning platters to write new data to.<\/em><br \/><\/p>\n\n\n\n<p>When you delete a file, what tends to happen is that your file system marks those <em>sectors<\/em> of the platter as places where new information can be written to, but doesn&#8217;t immediately overwrite those sectors. That&#8217;s one reason why sometimes deleted files can be recovered.<br \/><\/p>\n\n\n\n<p>Depending on your file system, there\u2019s a natural consequence as you delete and write files of different sizes to the hard drive: <a href=\"https:\/\/en.wikipedia.org\/wiki\/File_system_fragmentation\"><em>fragmentation<\/em><\/a>. This kinda sounds like the actual physical disk is falling apart, but that\u2019s not what it means. <a href=\"https:\/\/en.wikipedia.org\/wiki\/Fragmentation_(computing)#Data_fragmentation\"><em>Data fragmentation<\/em><\/a> is probably a more precise way of thinking about it.<br \/><\/p>\n\n\n\n<p>Imagine you have a sheet of white paper broken up into a grid of 5 boxes by 5 boxes (25 boxes in total), and a box of paints and paintbrushes.<br \/><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/XfvLOZz3H0BB91UQMeGiQsNf7EW2GyL5Q7PDON3HxLkO8Dl5XTslyDI2E2z28e41zEK2GE9Gm7-izG7h0gxKVryPIJDXdCNUcPsFpU1ZufjbHdSc04NMjlrKrt-5I22uvLGpLTJt\" alt=\"\"\/><\/figure>\n\n\n\n<p>Each square on the paper is white to start. Now, starting from the top-left, and going from left-to-right, top-to-bottom, use your paint to fill in 10 of those boxes with the colour red. Now use your paint to fill in the next 5 boxes with blue. Now do 3 more boxes with yellow.<br \/><\/p>\n\n\n\n<p>So we\u2019ve got our colour-filled boxes in neat, organized rows (red, then blue, then yellow), and we\u2019ve got 18 of them filled, and 7 of them still white.<br \/><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/AvpzTCzYL_R-fbUcyM36ChvCGHZGzFmIgcGDihc1ny3nYFKEg6pdRdv94pWY3fkfy4vVy8cBS0YJ_yPXSZ0zXptHZbZLpuVvm1sKM4btswQmBlCm4STti3mjYTIeOTEdjZjddcRz\" alt=\"\"\/><\/figure>\n\n\n\n<p>Now let\u2019s say we don\u2019t care about the colour blue. We\u2019re okay to paint over those now with a new colour. We also want to fill in 10 boxes with the colour purple. Hm\u2026 there aren\u2019t enough free white boxes to put in that many purple ones, but we have these 5 blue ones we can paint over. Let\u2019s paint over them with purple, and then put the next 5 at the end in the white boxes.<br \/><\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/eW6yT1un7oJswFFfDYx6NEJHP3tlI4uKip61KH7wAkKY_rZ1nlSREOVOEzRgfRWtheyCtlrkNecMC4Pi-OgKFTKuplhdIrK2AtSIYLqlMM4CrUrQJhwrskXNnhmmswnaa1PmGVG4\" alt=\"\"\/><\/figure>\n\n\n\n<p>So now 23 of the boxes are filled, we\u2019ve got 2 left at the end that are white, but also, notice that the purple boxes aren\u2019t all together \u2014 they\u2019ve been broken apart into two sections. They\u2019ve been <em>fragmented<\/em>.<br \/><\/p>\n\n\n\n<p>This is an incredibly simplified model, but (I hope) it demonstrates what happens when you delete and write files to a hard drive. Gaps open up that can be written to, and bits and pieces of files end up being distributed across the platters as fragments.<br \/><\/p>\n\n\n\n<p>This also occurs as files grow. If, for example, we decided to paint two more white boxes red, we\u2019d need to paint the ones at the very end, breaking up the red boxes so that they\u2019re fragmented.<br \/><\/p>\n\n\n\n<p>So going back to our vinyl player example for a second \u2014 &nbsp;the ideal scenario is that you start a song at the beginning and it plays straight through until the end, right? The more common case with disk drives, however, is you read bits and pieces of a song from different parts of the vinyl: you have to lift and move the arm each time until eventually you have heard the song from start to finish. That seeking of the arm adds overhead to the time it takes to listen to the song from beginning to end.<br \/><\/p>\n\n\n\n<p>When your hard drive undergoes <em>defragmentation<\/em>, what your computer does is try to re-organize your disk so that files are in <em>contiguous sectors<\/em> on the platters. That\u2019s a fancy way of saying that they\u2019re all in a row on the platter, so they can be read in without the overhead of seeking around to assemble it as fragments.<br \/><\/p>\n\n\n\n<p>Skipping that overhead can have huge benefits to your computer\u2019s performance, because <em>the disk is usually the slowest part of your computer.<\/em><br \/><\/p>\n\n\n\n<p>I\u2019ve skipped over and simplified a bunch of stuff here in the interests of brevity, but <a href=\"https:\/\/www.youtube.com\/watch?v=KN8YgJnShPM&amp;feature=youtu.be\">this is a great video that gives a crash course on file systems and storage<\/a>. I encourage you to watch it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">On the relative input \/ output speeds of modern computing components<\/h2>\n\n\n\n<p>I mentioned in the disclaimer at the start of this post that I\u2019m not a disk specialist or expert. <a href=\"http:\/\/talkingtechwithshd.com\/\">Scott Davis<\/a> is probably a better bet as one of those. His <a href=\"http:\/\/talkingtechwithshd.com\/biography\/\">bio<\/a> lists an impressive wealth of experience, and mentions that he\u2019s \u201ca recognized expert in virtualization, clustering, operating systems, cloud computing, file systems, storage, end user computing and cloud native applications.\u201d<br \/><\/p>\n\n\n\n<p>I don\u2019t know Scott at all (if you\u2019re reading this, Hi, Scott!), but let\u2019s just agree for now that he probably knows more about disks than I do.<br \/><\/p>\n\n\n\n<p>I\u2019m picking Scott as an expert because of a particularly illustrative analogy that was <a href=\"https:\/\/blog.infinio.com\/relative-speeds-from-ram-to-flash-to-disk\">posted to a blog for a company he used to work for<\/a>. The analogy compares the speeds of different media that can be used to store information on a computer. Specifically, it compares the following:<br \/><\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>RAM<\/li><li>The network with a decent connection<\/li><li>Flash drives<\/li><li>Magnetic hard drives \u2014 what we\u2019ve been discussing up until now.<\/li><\/ol>\n\n\n\n<p>For these media, the post claims that input \/ output speed can be measured using the following units:<br \/><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>RAM is in nanoseconds <\/li><li>10GbE Network speed is in microseconds (~50 microseconds)<\/li><li>Flash speed is in microseconds (between 20-500+ microseconds)<\/li><li>Disk speed is in milliseconds<\/li><\/ul>\n\n\n\n<p>That all seems pretty fast. What\u2019s the big deal? Well, it helps if we zoom in a little bit. The post does this by supposing that we pretend that RAM speed happens in minutes.<\/p>\n\n\n\n<p>If that\u2019s the case, then we\u2019d have to measure network speed in weeks.<br \/><\/p>\n\n\n\n<p>And if that\u2019s the case, then we\u2019d want to measure the speed of a Flash drive in months.<br \/><\/p>\n\n\n\n<p>And if that\u2019s the case, then we\u2019d have to measure the speed of a magnetic spinny disk in <strong>decades.<\/strong><\/p>\n\n\n\n<p><strong>Update (May 23, 2019)<\/strong>: <em>My Uncle Mark, who also works in computing, sent me links that show similar visualizations of computing latency: this one has <\/em><a href=\"https:\/\/dzone.com\/articles\/scale-computing-latencies\"><em>a really excellent infographic<\/em><\/a><em>, and <\/em><a href=\"https:\/\/www.prowesscorp.com\/computer-latency-at-a-human-scale\/\"><em>this one has more discussion<\/em><\/a><em>. These articles highlight network latency as the worst offender, which is true especially when the quality of service is low, but I&#8217;m mostly writing this post for folks who hack on Firefox where the vast majority of networking occurs off of the main thread.<\/em><\/p>\n\n\n\n<p>I wish I had some ACM paper, or something written by a computer science professor that I could point to you to bolster the following claim. I don\u2019t, not because one doesn\u2019t exist, but because I\u2019m too lazy to look for one. I hope you\u2019ll forgive me for that, but I don\u2019t think I\u2019m saying anything super controversial when I say:<br \/><\/p>\n\n\n\n<p><strong>In the common case, for a personal computer, it\u2019s best to assume that reading and writing to the disk is the slowest operation you can perform.<\/strong><br \/><\/p>\n\n\n\n<p>Sure, there are edge cases where other things in the system might be slower. And there is that disk cache that I breezed over earlier that might make reading or writing cheaper. And sometimes the operating system tries to do smart things to help you. For now, just let it go. I\u2019m making a broad generalization that I think covers the common cases, and I\u2019m talking about what\u2019s best to <em>assume.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Single and multi-threaded restaurants<\/h2>\n\n\n\n<p>When I try to describe threading and concurrency to someone, I inevitably fall back to the metaphor of cooks in a kitchen in a restaurant. This is a special restaurant where there\u2019s only one seat, for a single customer \u2014 you, the user.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Single-threaded programs<\/h3>\n\n\n\n<p>Let\u2019s imagine a restaurant that\u2019s very, very small and simple. In this restaurant, the cook is also acting as the waiter \/ waitress \/ server. That means when you place your order, the server \/ cook goes into the kitchen and makes it for you. While they\u2019re gone, you can\u2019t really ask for anything else \u2014 the server \/ cook is busy making the thing you asked for last.<br \/><\/p>\n\n\n\n<p>This is how most simple, single-threaded programs work\u2014the user feeds in requests, maybe by clicking a button, or typing something in, maybe something else entirely\u2014and then the program goes off and does it and returns some kind of result. Maybe at that point, the program just exits (\u201cThe restaurant is closed! Come back tomorrow!\u201d), or maybe you can ask for something else. It\u2019s really up to how the restaurant \/ program is designed that dictates this.<br \/><\/p>\n\n\n\n<p>Suppose you\u2019re very, very hungry, and you\u2019ve just ordered a complex five-course meal for yourself at this restaurant. Blanching, your server \/ cook goes off to the kitchen. While they\u2019re gone, nobody is refilling your water glass or giving you breadsticks. You\u2019re <em>pretty<\/em> sure there\u2019s activity going in the kitchen and that the server \/ cook hasn\u2019t had a heart attack back there, but you\u2019re going to be waiting a looooong time since there\u2019s only one person working in this place.<br \/><\/p>\n\n\n\n<p>Maybe in some restaurants, the server \/ cook will dash out periodically to refill your water glass, give you some breadsticks, and update you on how things are going, but it sure would be nice if we gave this person some help back there, wouldn\u2019t it?<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Multi-threaded programs<\/h3>\n\n\n\n<p>Let\u2019s imagine a slightly different restaurant. There are more cooks in the kitchen. The server is available to take your order (but is also able to cook in the kitchen if need be), and you make your request from the menu.<br \/><\/p>\n\n\n\n<p>Now suppose again that you order a five-course meal. The server goes to the kitchen and tells the cooks what you just ordered. In this restaurant, suppose the kitchen staff are a really great team and don\u2019t get in each other\u2019s way<sup id=\"rf3-3024\"><a href=\"#fn3-3024\" title=\"When writing multi-threaded programs, this is much harder than it sounds! Mozilla actually &lt;a href=&quot;https:\/\/www.rust-lang.org\/&quot;&gt;developed a whole new programming language&lt;\/a&gt; to make that easier to do correctly.\" rel=\"footnote\">3<\/a><\/sup>, so they divide up the order in a way that makes sense and get to work.<br \/><\/p>\n\n\n\n<p>The server can come back and refill your water glass, feed you breadsticks, perhaps they can tell you an entertaining joke, perhaps they can take additional orders that won\u2019t take as long. At any rate, in this restaurant, the interaction between the user and the server is frequent and rarely interrupted.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">The waiter \/ waitress \/ server is the main thread<\/h3>\n\n\n\n<p>In these two examples, the waiter \/ waitress \/ server is what is usually called the <em>main thread of execution<\/em>, which is the part of the program that the user interacts with most directly. By moving expensive operations <em>off of the main thread<\/em>, the <em>responsiveness<\/em> of the program increases.<br \/><\/p>\n\n\n\n<p>Have you ever seen the mouse turn into an hourglass, seen the \u201cThis program is not responding\u201d message on Windows? Or the spinny colourful pinwheel on macOS? In those cases, the main thread is off doing something and never came back to give you your order or refill your water or breadsticks \u2014 that\u2019s how it generally manifests in common operating systems. The program seems \u201cunresponsive\u201d, \u201csluggish\u201d, \u201cfrozen\u201d. It\u2019s \u201changing\u201d, or \u201cstuck\u201d. When I hear those words, my immediate assumption is that the main thread is busy doing something \u2014 either it\u2019s taking a long time (it\u2019s making you your massive five course meal, maybe not as efficiently as it could), or it\u2019s stuck (maybe they fell down a well!).<br \/><\/p>\n\n\n\n<p>In either case, the general rule of thumb to improving program responsiveness is to keep the server filling the user\u2019s water and breadsticks by offloading complex things on the menu to other cooks in the kitchen.<br \/><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Accessing the disk on the main thread<\/h2>\n\n\n\n<p>Recall that <em>in the common case, for a personal computer, it\u2019s best to assume that reading and writing to the disk is the slowest operation you can perform. <\/em>In our restaurant example, reading or writing to the disk on the main thread is a bit like having your server hop onto their bike and ride out to the next town over to grab some groceries to help make what you ordered.<br \/><\/p>\n\n\n\n<p>And sometimes, because of data fragmentation (not everything is all in one place), the server has to search amongst many many shelves all widely spaced apart to get everything.<br \/><\/p>\n\n\n\n<p>And sometimes the grocery store is <em>very busy<\/em> because there are other restaurants out there that are grabbing supplies.<br \/><\/p>\n\n\n\n<p>And sometimes there are police checks (anti-virus \/ anti-malware software) occurring for passengers along the road, where they all have to show their IDs before being allowed through.<br \/><\/p>\n\n\n\n<p>It\u2019s an incredibly slow operation. Hopefully by the time the server comes back, they don\u2019t realize they have to <em>go back out again to get more<\/em>, but they might if they didn\u2019t realize they were missing some more ingredients.<sup id=\"rf4-3024\"><a href=\"#fn4-3024\" title=\"Keen readers might notice I\u2019m leaving out a discussion on &lt;a href=&quot;https:\/\/en.wikipedia.org\/wiki\/Paging&quot;&gt;Paging&lt;\/a&gt;. That\u2019s because this blog post is getting quite long, and because it kinda breaks the analogy a bit \u2014 who sends groceries back to a grocery store?\" rel=\"footnote\">4<\/a><\/sup><br \/><\/p>\n\n\n\n<p>Slow slow slow. And unresponsive. And a great way to lose a hungry customer.<br \/><\/p>\n\n\n\n<p>For super small programs, where the kitchen is well stocked, or the ride to the grocery store doesn\u2019t need to happen often, having a single-thread and having it read or write is usually okay. I\u2019ve certainly written my fair share of utility programs or scripts that do main thread disk access.<br \/><\/p>\n\n\n\n<p><a href=\"https:\/\/www.mozilla.org\/en-US\/firefox\/new\/\">Firefox<\/a>, the program I spend most of my time working on as my job, is not a small program. It\u2019s a very, very, very large program. Using our restaurant model, it\u2019s many large restaurants with many many cooks on staff. The restaurants communicate with each other and ship food and supplies back and forth using messenger bikes, to provide to you, the customer, the best meals possible.<br \/><\/p>\n\n\n\n<p>But even with this large set of restaurants, there\u2019s still only a single waiter \/ waitress \/ server \/ main thread of execution as the point of contact with the user.<br \/><\/p>\n\n\n\n<p>Part of my job is to help organize the workflows of this restaurant so that they provide those meals <em>as quickly as possible<\/em>. Sending the server to the grocery store (main thread disk access) is part of the workflow that we <em>absolutely need to strike from the list<\/em>.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Start-up main-thread disk access<\/h2>\n\n\n\n<p>Going back to our analogy, imagine starting the program like opening the restaurant. The lights go on, the chairs come off of the tables, the kitchen gets warmed up, and prep begins.<br \/><\/p>\n\n\n\n<p>While this is occurring, it\u2019s all hands on deck \u2014 the server might be off in the kitchen helping to do prep, off getting cutlery organized, whatever it takes to get the restaurant open and ready to serve. Before the restaurant is open, there\u2019s no point in having the server be idle, because the customer hasn\u2019t been able to come in yet.<br \/><\/p>\n\n\n\n<p>So if critical groceries and supplies needed to open the restaurant need to be gotten before the restaurant is open, it\u2019s fine to send the server to the store. Somebody has to do it.<br \/><\/p>\n\n\n\n<p>For Firefox, there are various things that need to take place before we can display any UI. At that point, it\u2019s usually fine to do main-thread disk access, so long as all of the things being read or written are kept to an <em>absolute minimum<\/em>. Find how much you need to do, and reduce it as much as possible.<br \/><\/p>\n\n\n\n<p>But as soon as UI is presented to the user, the restaurant is open. At that point, the server should stay off their bike and keep chatting with the customer, even if the kitchen hasn\u2019t finished setting up and getting all of their supplies. So to stay responsive, don\u2019t do disk access on the main thread of execution after you\u2019ve started to show the user some kind of UI.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Disk contention<\/h2>\n\n\n\n<p>There\u2019s one last complication I want to capture here with our restaurant example before I wrap up. I\u2019ve been saying that it\u2019s important to send anyone except the server to the grocery store for supplies. That\u2019s true \u2014 but be careful of sending <em>too many other people at the same time<\/em>.<br \/><\/p>\n\n\n\n<p>Moving disk access off of the main thread is good for responsiveness, full stop. However, it might do nothing to actually improve the overall time that it takes to complete some amount of work. Put it another way: just because the server is refilling your glass and giving you breadsticks doesn\u2019t mean that your five-course meal is going to show up any faster.<br \/><\/p>\n\n\n\n<p>Also, <em>disk operations on magnetic drives do not have a constant speed<\/em>. Having the disk do many things at once within a single program or across multiple programs can slow the whole set of operations down due to the overhead of <em>seeking <\/em>and <em>context switching<\/em>, since the operating system will try to serve all disk requests at once, more or less.<sup id=\"rf5-3024\"><a href=\"#fn5-3024\" title=\"I\u2019ve never worked on an operating system, but I believe most modern operating systems try to do a bunch of smart things here to schedule disk requests in efficient ways.\" rel=\"footnote\">5<\/a><\/sup><br \/><\/p>\n\n\n\n<p>Disk contention and main thread disk access is something I think a lot about these days while my team and I work on improving Firefox start-up performance.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Some questions to ask yourself when touching disk<\/h2>\n\n\n\n<p>So it\u2019s important to be thoughtful about disk access. Are you working on code that touches disk? Here are some things to think about:<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Is UI visible, and responsiveness a goal?<\/h3>\n\n\n\n<p>If so, best to move the disk access off of the main-thread. That was the main thing I wanted to capture, and I hope I\u2019ve convinced you of that point by now.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Does the access need to occur?<\/h3>\n\n\n\n<p>As programs age and grow and contributors come and go, sometimes it\u2019s important to take a step back and ask, \u201cAre the assumptions of this disk access still valid? Does this access need to happen at all?\u201d The fastest code is the code that doesn\u2019t run at all.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">What else is happening during this disk access? Can disk access be prioritized more efficiently?<\/h3>\n\n\n\n<p>This is often trickier to answer as a program continues to run. Thankfully, tools like <a href=\"https:\/\/en.wikipedia.org\/wiki\/Profiling_(computer_programming)\">profilers<\/a> can help capture recordings of things like disk access to gain evidence of simultaneous disk access. <br \/><\/p>\n\n\n\n<p>Start-up is a special case though, since there\u2019s usually a somewhat deterministic \/ reliably stable set of operations that occur in the same way in roughly the same order during start-up. For start-up, using a tool like a profiler, you can gain a picture of the sorts of things that tend to happen during that special window of time. If you notice a lot of disk activity occurring simultaneously across multiple threads, perhaps ponder if there\u2019s a better way of ordering those operations so that the most important ones complete first.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can we reduce how much we need to read or write?<\/h3>\n\n\n\n<p>There are lots of wonderful compression algorithms out there with a variety of performance characteristics that might be worth pondering. It might be worth considering compressing the data that you\u2019re storing before writing it so that the disk has to write less and read less.<br \/><\/p>\n\n\n\n<p>Of course, there\u2019s compression and decompression overhead to consider here. Is it worth the CPU time to save the disk time? Is there some other CPU intensive task that is more critical that\u2019s occurring?<br \/><\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Can we organize the things that we want to read ahead of time so that they\u2019re more likely to be read contiguously (without seeking the disk)?<\/h3>\n\n\n\n<p>If you know ahead of time the sorts of things that you\u2019re going to be reading off of the disk, it\u2019s generally a good strategy to store them in that read order. That way, in the best case scenario (the disk is defragmented), the read head can fly along the sectors and read everything in, in exactly the right order you want them. If the user has defragmented their disk, but the things you\u2019re asking for are all out of order on the disk, you\u2019re adding overhead to seek around to get what you want.<br \/><\/p>\n\n\n\n<p>Supposing that the data on the disk is fragmented, I suspect having the files in order anyways is probably better than not, but I don\u2019t think I know enough to prove it.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Flawed but useful<\/h2>\n\n\n\n<p>One of my mentors, <a href=\"http:\/\/third-bit.com\/\">Greg Wilson<\/a>, likes to say that \u201call models are flawed, but some are useful\u201d. <a href=\"https:\/\/en.wikipedia.org\/wiki\/All_models_are_wrong\">I don&#8217;t think he coined it<\/a>, but he uses it in the right places at the right times, and to me, that\u2019s what counts.<br \/><\/p>\n\n\n\n<p>The information in this post is not exhaustive \u2014 I glossed over and left out a lot. It\u2019s flawed. Still, I hope it can be useful to you.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Thanks<\/h2>\n\n\n\n<p>Thanks to the following folks who read drafts of this and gave feedback:<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Mandy Cheang<\/li><li>Emily Derr<\/li><li>Gijs Kruitbosch<\/li><li>Doug Thayer<\/li><li>Florian Qu\u00e8ze<\/li><\/ul>\n<hr class=\"footnotes\"><ol class=\"footnotes\" style=\"list-style-type:decimal\"><li id=\"fn1-3024\"><p >There are also newer forms of disks called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Flash_drive\">Flash disks<\/a> and <a href=\"https:\/\/en.wikipedia.org\/wiki\/Solid-state_drive\">SSDs<\/a>. I\u2019m not really going to cover those in this post.&nbsp;<a href=\"#rf1-3024\" class=\"backlink\" title=\"Return to footnote 1.\">&#8617;<\/a><\/p><\/li><li id=\"fn2-3024\"><p >The other thing to keep in mind is that the disk cache can have its contents evicted at any time for reasons that are out of your control. If you time it right, you can maybe increase the probability of a file you want to read being in the cache, but don\u2019t bet the farm on it.&nbsp;<a href=\"#rf2-3024\" class=\"backlink\" title=\"Return to footnote 2.\">&#8617;<\/a><\/p><\/li><li id=\"fn3-3024\"><p >When writing multi-threaded programs, this is much harder than it sounds! Mozilla actually <a href=\"https:\/\/www.rust-lang.org\/\">developed a whole new programming language<\/a> to make that easier to do correctly.&nbsp;<a href=\"#rf3-3024\" class=\"backlink\" title=\"Return to footnote 3.\">&#8617;<\/a><\/p><\/li><li id=\"fn4-3024\"><p >Keen readers might notice I\u2019m leaving out a discussion on <a href=\"https:\/\/en.wikipedia.org\/wiki\/Paging\">Paging<\/a>. That\u2019s because this blog post is getting quite long, and because it kinda breaks the analogy a bit \u2014 who sends groceries back to a grocery store?&nbsp;<a href=\"#rf4-3024\" class=\"backlink\" title=\"Return to footnote 4.\">&#8617;<\/a><\/p><\/li><li id=\"fn5-3024\"><p >I\u2019ve never worked on an operating system, but I believe most modern operating systems try to do a bunch of smart things here to schedule disk requests in efficient ways.&nbsp;<a href=\"#rf5-3024\" class=\"backlink\" title=\"Return to footnote 5.\">&#8617;<\/a><\/p><\/li><\/ol>","protected":false},"excerpt":{"rendered":"<p>I&#8217;m writing this in lieu of a traditional Firefox Front-end Performance Update, as I think this will be more useful in the long run than just a snapshot of what my team is doing. I want to talk about main thread disk access (sometimes referred to more generally as \u201cmain thread IO\u201d). Specifically, I\u2019m going [&hellip;]<\/p>\n","protected":false},"author":4,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_feature_clip_id":0,"_jetpack_memberships_contains_paid_content":false,"footnotes":"","jetpack_publicize_message":"","jetpack_publicize_feature_enabled":true,"jetpack_social_post_already_shared":true,"jetpack_social_options":{"image_generator_settings":{"template":"highway","default_image_id":0,"font":"","enabled":false},"version":2},"jetpack_post_was_ever_published":false},"categories":[5,874,861,79],"tags":[],"class_list":["post-3024","post","type-post","status-publish","format-standard","hentry","category-computer-science","category-firefox-mozilla-2","category-mozilla-2","category-technology"],"jetpack_publicize_connections":[],"jetpack_featured_media_url":"","jetpack_shortlink":"https:\/\/wp.me\/prmTy-MM","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts\/3024","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/users\/4"}],"replies":[{"embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/comments?post=3024"}],"version-history":[{"count":3,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts\/3024\/revisions"}],"predecessor-version":[{"id":3028,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/posts\/3024\/revisions\/3028"}],"wp:attachment":[{"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/media?parent=3024"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/categories?post=3024"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mikeconley.ca\/blog\/wp-json\/wp\/v2\/tags?post=3024"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}