Reviewing the first SDK package heapdump-collector

74 views
Skip to first unread message

Erick Tryzelaar

unread,
Jul 20, 2023, 5:35:49 PM7/20/23
to api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Fabio D'Urso, Miguel Flores

Hello everyone,


Andrew has just landed support for RFC-0208 for SDK packages in http://fxrev.dev/880035. We are now ready to review our first package, heapdump-collector, which Andrew has prototyped out in http://fxrev.dev/880976.


We don’t yet have a process defined for reviewing SDK packages, so I want to propose we treat the subset of package files that users depend upon as our API. All other files would be internal details that could change over time. In order to support this, the fuchsia_sdk_package() gn template supports specifying these files through two rules:


  • expected_files_present: These files must exist at this package path, but can change over time.

  • expected_files_exact: These files must exist at this package path, but these files should not change without extra approval. This is done by including the file hash in a golden file, which the build system verifies has not changed.


Both of these produce content-checklist files, which allows the fuchsia build system and downstream projects to assert that packages contain the expected files and contents.


In most cases we would only list the component manifests in a package. However, it’s possible we could have data file packages, such as a package full of fonts, which could be exposed as a capability to super-components.


In our case of heapdump, there is a single component manifest collector.cml, which is compiled into the package as meta/heapdump-collector.cm. Since this exposes capabilities to the super-component, we used expected_files_exact for this file so that any capability changes would go through review.

 We’d use expected_files_exact here since removing a capability could break downstream consumers.


We currently are using the partner category, but this was somewhat arbitrarily chosen. The exported FIDL APIs are fuchsia.memory.heapdump.client, which is partner_internal, and  fuchsia.memory.heapdump.process, which is internal.


Finally, we some questions as we wrote this up for consideration:

  • Should we use the most restrictive category internal here? Presumably a super-component of heapdump-collector may only be interested in a subset of the exported capabilities, so it may be okay to use partner or partner_internal with the understanding that OOT users couldn’t use fuchsia.memory.heapdump.process.

  • Do we need to promote fuchsia.memory.heapdump.process to be partner or partner_internal before we add this package to the SDK?

  • If we should use the most restrictive category, should fuchsia_sdk_package() enforce this rule for all SDK packages? If so, would it be alright if we implemented that in a future CL? It wouldn't impact OOT partners, and using our current implementation would unblock them to start prototyping their support for SDK packages.

  • Since this is the first SDK package, we could have made a mistake that’d make it difficult for OOT projects to use the SDK package. Would it make more sense to use the experimental category to allow us to iterate on the design in case we run into issues? Or would it be better to use partner or partner_internal and iterate on the design with OOT projects until we lock in the design?


Thanks,

Erick and Andrew


Fabio D'Urso

unread,
Jul 21, 2023, 10:43:56 AM7/21/23
to api-c...@fuchsia.dev, Erick Tryzelaar, Andrew Pollack, John Wittrock, Miguel Flores, Hunter Freyer, Étienne J. Membrives, Przemek Pietrzkiewicz
On Thu, Jul 20, 2023 at 11:35 PM Erick Tryzelaar <etryz...@google.com> wrote:

Hello everyone,

Hi! 

Andrew has just landed support for RFC-0208 for SDK packages in http://fxrev.dev/880035. We are now ready to review our first package, heapdump-collector, which Andrew has prototyped out in http://fxrev.dev/880976.


Thanks Erick and Andrew for landing support for RFC-0208 and starting the review process!

In addition to the heapdump-collector package, we would like to also add a library (fxrev.dev/870445) and a shard (fxrev.dev/870444) to the SDK.

In short, taking heap memory profiles with heapdump involves three pieces of software:
  • The software being profiled, which must be linked against the libheapdump_instrumentation.so library. It is this library that internally connects to heapdump-collector using fuchsia.memory.heapdump.process (which is currently internal).
  • The ffx profile heapdump client, which connects to heapdump-collector using fuchsia.memory.heapdump.client (which is currently partner_internal).
  • heapdump-collector itself, which serves both protocols and works as an intermediary between the program being profiled and the ffx client.
When a program wants to be profiled, it will have to be rebuilt with four small changes:
  • Link against the libheapdump_instrumentation.so library, which will come from the SDK.
  • Call the heapdump_bind_with_fdio() function at the beginning of its main (whose definition is in #include <heapdump/bind_with_fdio.h>, also from the SDK).
  • Include heapdump's shard (from the SDK) in its component manifest, which will create an instance of heapdump-collector as a child component and use its fuchsia.memory.heapdump.process.Registry protocol.
  • Add the heapdump-collector package as a subpackage, so that the heapdump-collector#meta/heapdump-collector.cm relative URL contained in the shard can actually be resolved.
The ffx client is magic and it contains logic to automatically locate the heapdump-collector, whose position in the component tree depends on what parent component has been instrumented, by finding the fuchsia.memory.heapdump.client.Collector capability it declares.

  • Do we need to promote fuchsia.memory.heapdump.process to be partner or partner_internal before we add this package to the SDK?


We've started to discuss whether fuchsia.memory.heapdump.process can stay internal or not in one of the comments in fxrev.dev/870444.
The idea is that, since this is the protocol between the library and the heapdump-collector package and they both come from the SDK, they will always be updated in lockstep and we can avoid committing to API stability guarantees. Profiled programs never interact directly with this protocol, because all the interactions are hidden within the libheapdump_instrumentation.so library.

Can we continue discussing the other heapdump SDK CLs (fxrev.dev/870445 and fxrev.dev/870444) in this thread or do you prefer me to start a separate one?

Thanks,
Fabio

Fabio D'Urso

unread,
Jul 24, 2023, 9:27:45 AM7/24/23
to api-c...@fuchsia.dev, Hunter Freyer, Erick Tryzelaar, Andrew Pollack, John Wittrock, Miguel Flores, Étienne J. Membrives, Przemek Pietrzkiewicz, Christopher Anderson
Quick update on the CL that had made fuchsia.memory.heapdump.client partner_internal, i.e. fxrev.dev/879252.

I've been asked to revert it because it had not gone through proper API calibration. It's now been reverted and I've created a reland for proper review.
Therefore, the new chain of heapdump CLs for review is:
Sorry for the confusion,
Fabio

Hunter Freyer

unread,
Jul 24, 2023, 9:53:23 AM7/24/23
to Erick Tryzelaar, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Fabio D'Urso, Miguel Flores
Thanks so much for starting this conversation!

On Thu, Jul 20, 2023 at 5:35 PM 'Erick Tryzelaar' via api-council <api-c...@fuchsia.dev> wrote:

Hello everyone,


Andrew has just landed support for RFC-0208 for SDK packages in http://fxrev.dev/880035. We are now ready to review our first package, heapdump-collector, which Andrew has prototyped out in http://fxrev.dev/880976.


We don’t yet have a process defined for reviewing SDK packages, so I want to propose we treat the subset of package files that users depend upon as our API. All other files would be internal details that could change over time. In order to support this, the fuchsia_sdk_package() gn template supports specifying these files through two rules:


  • expected_files_present: These files must exist at this package path, but can change over time.

  • expected_files_exact: These files must exist at this package path, but these files should not change without extra approval. This is done by including the file hash in a golden file, which the build system verifies has not changed.


Both of these produce content-checklist files, which allows the fuchsia build system and downstream projects to assert that packages contain the expected files and contents.


In most cases we would only list the component manifests in a package. However, it’s possible we could have data file packages, such as a package full of fonts, which could be exposed as a capability to super-components.


In our case of heapdump, there is a single component manifest collector.cml, which is compiled into the package as meta/heapdump-collector.cm. Since this exposes capabilities to the super-component, we used expected_files_exact for this file so that any capability changes would go through review.

 We’d use expected_files_exact here since removing a capability could break downstream consumers.


We currently are using the partner category, but this was somewhat arbitrarily chosen. The exported FIDL APIs are fuchsia.memory.heapdump.client, which is partner_internal, and  fuchsia.memory.heapdump.process, which is internal.


Finally, we some questions as we wrote this up for consideration:

  • Should we use the most restrictive category internal here? Presumably a super-component of heapdump-collector may only be interested in a subset of the exported capabilities, so it may be okay to use partner or partner_internal with the understanding that OOT users couldn’t use fuchsia.memory.heapdump.process.


Only `partner` can be used by OOT clients (docs). `partner_internal` is for situations where we need to support version skew, but we don't support OOT clients using it directly (e.g., they should just plumb it to some client library). 

IIUC, if we made the package `internal` it shouldn't even get shipped to clients.

  • Do we need to promote fuchsia.memory.heapdump.process to be partner or partner_internal before we add this package to the SDK?


Building on what Fabio says in the side-thread: fuchsia.memory.heapdump.process is kind of an interesting case, since it's not really offered "from the platform to the product", which is how we think of most SDK APIs. Since it's offered by a subpackage to an SDK library, it's sorta "from a product to a product" - the customer could have defined this package and the FIDL protocol and the library all on their own, so it's only in the SDK as a matter of convenience.

I'm not 100% convinced by that logic, though. The nice thing about "internal" protocols is that we can change them however we want and nobody outside the platform should mind. Theoretically, a client could code directly against the FIDL, or could introduce version skew between the subpackage and the library, and if they did and we broke compatibility for them, would we say, "sorry about that!" or would we say, "your use case is not supported"? If the former, then this should move to `partner` or `partner_internal` (depending on which use cases we want to support).

Fabio, do you think `fuchsia.memory.heapdump.process` is ready to go into the SDK? Or do you think that'd be a serious impediment to velocity? 
 
  • If we should use the most restrictive category, should fuchsia_sdk_package() enforce this rule for all SDK packages? If so, would it be alright if we implemented that in a future CL? It wouldn't impact OOT partners, and using our current implementation would unblock them to start prototyping their support for SDK packages.


SDK packages should probably always be `partner`, though we'll see if there are situations where another category makes sense.
 
  • Since this is the first SDK package, we could have made a mistake that’d make it difficult for OOT projects to use the SDK package. Would it make more sense to use the experimental category to allow us to iterate on the design in case we run into issues? Or would it be better to use partner or partner_internal and iterate on the design with OOT projects until we lock in the design?


`experimental` is deprecated and going away. If our partners are using it, it should be `partner`. We work closely enough with them that we should be able to set expectations about the maturity of different parts of the SDK.
 

Thanks,

Erick and Andrew


--
You received this message because you are subscribed to the Google Groups "api-council" group.
To unsubscribe from this group and stop receiving emails from it, send an email to api-council...@fuchsia.dev.
To view this discussion on the web visit https://groups.google.com/a/fuchsia.dev/d/msgid/api-council/CAP%2BUz2bNB6T5GYgXf3%2BqBbyQN0VqVRN-fabn3JLDKFv80jP8ag%40mail.gmail.com.

Fabio D'Urso

unread,
Jul 24, 2023, 12:56:11 PM7/24/23
to Hunter Freyer, Erick Tryzelaar, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
  • Do we need to promote fuchsia.memory.heapdump.process to be partner or partner_internal before we add this package to the SDK?


Building on what Fabio says in the side-thread: fuchsia.memory.heapdump.process is kind of an interesting case, since it's not really offered "from the platform to the product", which is how we think of most SDK APIs. Since it's offered by a subpackage to an SDK library, it's sorta "from a product to a product" - the customer could have defined this package and the FIDL protocol and the library all on their own, so it's only in the SDK as a matter of convenience.

I'm not 100% convinced by that logic, though. The nice thing about "internal" protocols is that we can change them however we want and nobody outside the platform should mind. Theoretically, a client could code directly against the FIDL, or could introduce version skew between the subpackage and the library, and if they did and we broke compatibility for them, would we say, "sorry about that!" or would we say, "your use case is not supported"? If the former, then this should move to `partner` or `partner_internal` (depending on which use cases we want to support).

Fabio, do you think `fuchsia.memory.heapdump.process` is ready to go into the SDK? Or do you think that'd be a serious impediment to velocity? 

The FIDL protocol is actually very simple. However, heapdump makes heavy use of shared memory, and the bulk of the protocol is in the shared VMOs, which are atomically written by the library and snapshot+read by the package (similarly to what Inspect does).
Maintaining ABI compatibility means ensuring that the data structures in shared memory - which are *not* defined in FIDL - stay compatible.

When we started designing the protocol, we didn't know that shipping packages in the SDK (and, as a consequence, the possibility of updating both sides in lockstep) would happen soon, and we actually designed the collector package with forward-compatibility in mind.
In short, the idea is that the FIDL protocol lets the library state what version of the data structures it produces. This version is currently V1, which corresponds to the "RegisterV1" method. If we changed the format of data structures in shared memory, we would add a new "RegisterV2" method in the FIDL protocol (and keep the collector capable of dealing with both V1 and V2 clients). More details are present in the "VMO format" section in heapdump's README file.

Still, given the shiny new possibility of having both sides naturally in sync simply by being both part of the SDK, I would very much prefer to get rid of all this extra complexity and simplify future development by retaining as much flexibility as we can.

Unless I'm missing something, I think that introducing version skew between the library and the package would be hard/impossible to do by mistake if they both come from the SDK. Conceptually, I see that like trying to mix-and-match different prebuilt .so files taken from different SDK versions. I'm for not supporting this use case at all.

As for clients coding directly against the fuchsia.memory.heapdump.process FIDL protocols, such hypothetical clients would end up producing exactly the same data structures in shared memory and implementing exactly the same atomic update algorithms. While this could maybe be useful to support e.g. non C++/Rust languages, an alternative memory profiler may be more interested in directly implementing the fuchsia.memory.heapdump.client protocol instead, which would still make it usable from the ffx client while keeping the freedom to organize in-memory data in a different way.

Fabio

Erick Tryzelaar

unread,
Aug 3, 2023, 11:12:56 AM8/3/23
to Fabio D'Urso, Hunter Freyer, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
Good morning everyone,

So what should be our next step with adding heapdump as an SDK package? Shall we wait until fuchsia.memory.heapdump.client has gone through API review and landed in http://fxrev.dev/888491? Or are there other changes we need to make to heapdump before adding it to the SDK?

Thanks,
Erick

Hunter Freyer

unread,
Aug 3, 2023, 12:13:57 PM8/3/23
to Erick Tryzelaar, Fabio D'Urso, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
While we're working on adding the protocols to the SDK, let's try doing an async calibration on the package itself, where the "API" is the manifest of customer-visible files, combined with the contents of those files themselves. 

Erick, could you fill out go/fuchsia-api-calibration-request and let's just try using the newly semi-standardized process?

Thanks!

Hunter

Erick Tryzelaar

unread,
Aug 9, 2023, 4:45:52 PM8/9/23
to Hunter Freyer, Fabio D'Urso, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
Hello Hunter,

I submitted requests for heapdump-collector and for realm-builder-server. Are there any next steps we need to take?

Hunter Freyer

unread,
Aug 10, 2023, 12:56:56 PM8/10/23
to Erick Tryzelaar, David Turner, Fabio D'Urso, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
Took a look just now. 

I've added +David Turner to one of the CLs to talk about the build rules and how these ought to get folded into the IDK (and from there, the SDKs). Let's iterate on the general pattern of the build rules and what "these kinds of API reviews look like", and then we'll maybe do a more general calibration (though the "API surface" itself is really quite boring so we may save a calibration for future CLs.)

Thanks,
Hunter

Erick Tryzelaar

unread,
Aug 30, 2023, 5:30:37 PM8/30/23
to Hunter Freyer, David Turner, Fabio D'Urso, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
Hello all,

Now that we have landed the realm-builder-server SDK package, I wanted to check in on how we're doing with the heapdump-collector SDK package review in http://fxrev.dev/880976. Are we blocked on getting `fuchsia.memory.heapdump.client partner_internal` through review in http://fxrev.dev/888491?

Hunter Freyer

unread,
Aug 31, 2023, 9:59:10 AM8/31/23
to Erick Tryzelaar, David Turner, Fabio D'Urso, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
On Wed, Aug 30, 2023 at 5:30 PM Erick Tryzelaar <etryz...@google.com> wrote:
Hello all,

Now that we have landed the realm-builder-server SDK package, I wanted to check in on how we're doing with the heapdump-collector SDK package review in http://fxrev.dev/880976. Are we blocked on getting `fuchsia.memory.heapdump.client partner_internal` through review in http://fxrev.dev/888491?

I've just approved that CL, so we should be good to go on the package review!

Hunter Freyer

unread,
Aug 31, 2023, 10:06:46 AM8/31/23
to Erick Tryzelaar, David Turner, Fabio D'Urso, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores
On Thu, Aug 31, 2023 at 9:58 AM Hunter Freyer <hjfr...@google.com> wrote:

On Wed, Aug 30, 2023 at 5:30 PM Erick Tryzelaar <etryz...@google.com> wrote:
Hello all,

Now that we have landed the realm-builder-server SDK package, I wanted to check in on how we're doing with the heapdump-collector SDK package review in http://fxrev.dev/880976. Are we blocked on getting `fuchsia.memory.heapdump.client partner_internal` through review in http://fxrev.dev/888491?

I've just approved that CL, so we should be good to go on the package review!

Actually, one last thing: Fabio, can you prepare a CL adding a comment to heapdump-collector.cm indicating that even though fuchsia.memory.heapdump.process isn't in the SDK, it's okay because we don't expect any version skew because (etc etc)? Indicate that we're considering this a special case that we might have to reconsider in the future.

Fabio D'Urso

unread,
Aug 31, 2023, 11:40:56 AM8/31/23
to api-council, Hunter Freyer, David Turner, Fabio D'Urso, api-c...@fuchsia.dev, Andrew Pollack, John Wittrock, Miguel Flores, Erick Tryzelaar
Thanks Hunter! I've prepared fxrev.dev/910632 to add that comment.
Reply all
Reply to author
Forward
0 new messages