Facebook Codebase

0 views
Skip to first unread message

Granville Turley

unread,
Aug 4, 2024, 8:24:21 PM8/4/24
to scaponlanjeo
Facebookfor iOS (FBiOS) is the oldest mobile codebase at Meta. Since the app was rewritten in 2012, it has been worked on by thousands of engineers and shipped to billions of users, and it can support hundreds of engineers iterating on it at a time.

Ultimately, this design exacerbated the creation of nondeterministic code that was very difficult to debug or reproduce bugs. It was clear that this architecture was not sustainable and it was time to rethink it.


After spending a few months building and migrating News Feed to run on a new declarative UI and a new data model, FBiOS saw a 50 percent performance improvement. A few months later, they open-sourced their React-inspired UI framework for mobile, ComponentKit.


To this day, ComponentKit is still the de facto choice for building native UIs in Facebook. It has provided countless performance improvements to the app via view reuse pools, view flattening, and background layout computation. It also inspired its Android counterpart, Litho, and SwiftUI.


Ultimately, the choice to replace the UI and data layer with custom infra was a trade-off. To achieve a delightful user experience that could be reliably maintained, new employees would have to shelve their industry knowledge of Apple APIs to learn the custom in-house infra.


Aside from the runtime failures, dylibs also introduced a new class of linker errors. In the event the code in Facebook (the startup set) referenced code in a dylib, engineers would see a linker error like this:


On top of all that, our only system to safeguard against the introduction of these calls during startup was runtime-based, and many releases were delayed while last-minute regressions were introduced into the app.


There is no doubt that plugins led FBiOS farther away from idiomatic iOS development, but the trade-offs seem to be worth it. Our engineers can change code used in many apps at Meta and be sure that if the plugin system is happy, no app should crash because of missing functionality in a rarely tested codepath. Teams like News Feed and Groups can build an extension point for plugins and be sure that product teams can integrate into their surface without touching the core code.


In 2020, FBiOS began to see a rise in the number of Swift-only APIs from Apple and a growing sentiment for more Swift in the codebase. It was finally time to reconcile with the fact that Swift was an inevitable tenant in FB apps.


Ultimately, the FBiOS team began to advise that product-facing APIs/code should not contain C++ so that we could freely use Swift and future Swift APIs from Apple. Using plugins, FBiOS could abstract away C++ implementations so that they still powered the app but were hidden from most engineers.


This type of workstream signified a bit of shift in the way FBiOS engineers thought about building abstractions. Since 2014, some of the biggest factors in framework building have been contributions to app size and expressiveness (which is why ComponentKit chose Objective-C++ over Objective-C).


To help personalize content, tailor and measure ads and provide a safer experience, we use cookies. By clicking or navigating the site, you agree to allow our collection of information on and off Facebook through cookies. Learn more, including about available controls: Cookie Policy


Meta maintains all their code in a single repository and initially used Git. Due to performance issues attributed to the repo's size, they consulted Git's team, who suggested switching to a multi-repo structure. Meta declined this advice and migrated to Mercurial instead.


Our code base has grown organically and its internal dependencies are very complex. We could have spent a lot of time making it more modular in a way that would be friendly to a source control tool, but there are a number of benefits to using a single repository. Even at our current scale, we often make large changes throughout our code base, and having a single repository is useful for continuous modernization. Splitting it up would make large, atomic refactorings more difficult.


With a mono-repo you can refactor both sides of an API at the same time and commit the change atomically. In other words, it's a mono-repo to support what is effectively a mono-lith: a single giant tightly coupled codebase.


My take away from these, admittedly sparse articles, is that google and meta found it too hard to refactor to a componentized version of their software products. Instead they attacked the face value problems the developers were experiencing, Slow branching and merging of massive codebases, by optimising the source control software.


Those of us old enough to remember SourceSafe will know the pain of slow source control and copying all the files just to make a branch. When someone comes along and makes it 10x faster, that's great and you jump at it. But I don't think I would go back from git's delta based model and any to any merges for love nor money


Static analysis tools are programs that examine, and attempt to draw conclusions about, the source of other programs without running them. At Facebook, we have been investing in advanced static analysis tools that employ reasoning techniques similar to those from program verification. The tools we describe in this article (Infer and Zoncolan) target issues related to crashes and to the security of our services, they perform sometimes complex reasoning spanning many procedures or files, and they are integrated into engineering workflows in a way that attempts to bring value while minimizing friction.


These tools run on code modifications, participating as bots during the code review process. Infer targets our mobile apps as well as our backend C++ code, codebases with 10s of millions of lines; it has seen over 100 thousand reported issues fixed by developers before code reaches production. Zoncolan targets the 100-million lines of Hack code, and is additionally integrated in the workflow used by security engineers. It has led to thousands of fixes of security and privacy bugs, outperforming any other detection method used at Facebook for such vulnerabilities. We will describe the human and technical challenges encountered and lessons we have learned in developing and deploying these analyses.


There has been a tremendous amount of work on static analysis, both in industry and academia, and we will not attempt to survey that material here. Rather, we present our rationale for, and results from, using techniques similar to ones that might be encountered at the edge of the research literature, not only simple techniques that are much easier to make scale. Our goal is to complement other reports on industrial static analysis and formal methods,1,6,13,17 and we hope that such perspectives can provide input both to future research and to further industrial use of static analysis.


Next, we discuss the three dimensions that drive our work: bugs that matter, people, and actioned/missed bugs. The remainder of the article describes our experience developing and deploying the analyses, their impact, and the techniques that underpin our tools.


It is important for a static analysis developer to realize that not all bugs are the same: different bugs can have different levels of importance or severity depending on the context and the nature. A memory leak on a seldom-used service might not be as important as a vulnerability that would allow attackers to gain access to unauthorized information. Additionally, the frequency of a bug type can affect the decision of how important it is to go after. If a certain kind of crash, such as a null pointer error in Java, were happening hourly, then it might be more important to target than a bug of similar severity that occurs only once a year.


People and deployments. While not all bugs are the same, neither are all users; therefore, we use different deployment models depending on the intended audience (that is, the people the analysis tool will be deployed to).


Actioned reports and missed bugs. The goal of an industrial static analysis tool is to help people: at Facebook, this means the engineers, directly, and the people who use our products, indirectly. We have seen how the deployment model can influence whether a tool is successful. Two concepts we use to understand this in more detail, and to help us improve our tools, are actioned reports and observable missed bugs.


A missed bug is one that has been observed in some way, but that was not reported by an analysis. The means of observation can depend on the kind of bug. For security vulnerabilities we have bug bounty reports, security reviews, or SEV reviews. For our mobile apps we log crashes and app not-responding events that occur on mobile devices.


The reason we are interested in advanced static analyses at Facebook might be understood in classic terms as saying: false negatives matter to us. However, it is important to note the number of false negatives is notoriously difficult to quantify (how many unknown bugs are there?). Equally, though less recognized, the false positive rate is challenging to measure for a large, rapidly changing codebase: it would be extremely time consuming for humans to judge all reports as false or true as the code is changing.


Infer and Zoncolan both use techniques similar to some of what one might find at the edge of the research literature. Infer, as we will discuss, uses one analysis based on the theory of Separation Logic,16 with a novel theorem prover that implements an inference technique that guesses assumptions.5 Another Infer analysis involves recently published research results on concurrency analysis.2,10 Zoncolan implements a new modular parallel taint analysis algorithm.


But how can Infer and Zoncolan scale? The core technical features they share are compositionality and carefully crafted abstractions. For most of this article we will concentrate on what one gets from applying Infer and Zoncolan, rather than on their technical properties, but we outline their foundations later and provide more technical details in an online appendix ( =3338112&picked=formats).

3a8082e126
Reply all
Reply to author
Forward
0 new messages