return base::Uuid::ParseLowercase(base::StringPrintf(The way that these UUIDs are handled throughout feels expensive. Every use of this method that I can find (see: [1], [2], [3]) is converting to a string, parsing as a base::Uuid, and then immediately converting to string again. Perhaps we can skip some parsing and just have this return the string directly?
std::string uuid_str = XrUuidToBaseUuid(uuid).AsLowercaseString();[1]
start_idx = mesh_processing_offset_;Doesn't this have the potential to skip submeshes if a submesh that has already been processed is not present in a subsequent frame?
std::string uuid_str = XrUuidToBaseUuid(state.submeshId).AsLowercaseString();[2]
submesh_poses.push_back(state.submeshPoseInBaseSpace);Related to a previous comment I've given on a CL further down the stack: It's not clear to me from reading https://developer.android.com/develop/xr/openxr/extensions/XR_ANDROID_scene_meshing#XrSceneMeshSnapshotCreationResultANDROID if the `lastUpdatedTime` describes the pose _and_ the mesh data, or just the mesh data itself. I'm somewhat inclined to believe that, given that pose is reported along with the update time, that the pose might update with every query regardless of if `lastUpdatedTime` has changed. Would love to get some confirmation on that.
If the pose _can_ update independent of the `lastUpdateTime` then the current code would potentially cause the tracking of the meshes to drift when the mesh itself is unchanged.
current_frame_uuids.insert(XrUuidToBaseUuid(state.submeshId).AsLowercaseString());[3]
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
return base::Uuid::ParseLowercase(base::StringPrintf(The way that these UUIDs are handled throughout feels expensive. Every use of this method that I can find (see: [1], [2], [3]) is converting to a string, parsing as a base::Uuid, and then immediately converting to string again. Perhaps we can skip some parsing and just have this return the string directly?
Sure, I have directly converted it to the std::string.
std::string uuid_str = XrUuidToBaseUuid(uuid).AsLowercaseString();Adam Ren (xWF)[1]
Done
start_idx = mesh_processing_offset_;Doesn't this have the potential to skip submeshes if a submesh that has already been processed is not present in a subsequent frame?
Nice point. Since the submesh won't be permanently lost — it will be re-detected the next time the offset wraps back to 0 — the worst case is a delay of up to total_count / kBatchSize frames before it appears. Given that batch processing is disabled by default (kEnableBatchProcessing = false), I think this is acceptable for an MVP. More importantly, once the OpenXR runtime supports it, proper optimizations like bbox-based spatial queries and Level-of-Detail meshing configuration would be more effective and wouldn't have this ordering issue. I'll leave a TODO in the tech report for future improvement of this method.
std::string uuid_str = XrUuidToBaseUuid(state.submeshId).AsLowercaseString();Adam Ren (xWF)[2]
Done
submesh_poses.push_back(state.submeshPoseInBaseSpace);Related to a previous comment I've given on a CL further down the stack: It's not clear to me from reading https://developer.android.com/develop/xr/openxr/extensions/XR_ANDROID_scene_meshing#XrSceneMeshSnapshotCreationResultANDROID if the `lastUpdatedTime` describes the pose _and_ the mesh data, or just the mesh data itself. I'm somewhat inclined to believe that, given that pose is reported along with the update time, that the pose might update with every query regardless of if `lastUpdatedTime` has changed. Would love to get some confirmation on that.
If the pose _can_ update independent of the `lastUpdateTime` then the current code would potentially cause the tracking of the meshes to drift when the mesh itself is unchanged.
Thanks for flagging this, Brandon. My understanding is that lastUpdatedTime reflects any change to the submesh — both geometry and pose — since the pose is an intrinsic property of the submesh entity. If pose updates were not reflected in lastUpdatedTime, there would be no mechanism for the app to detect pose-only changes at all, which would be a fundamental API design issue.
I think a pose-only lastUpdatedTime bump would correct drift rather than cause it, since it signals that the runtime has repositioned the mesh — e.g. after a loop closure or relocalization event.
@niha...@google.com @tor...@google.com Could you confirm this? Specifically, does it get bumped when only the pose changes, or only when the geometry (vertices/indices) changes? I believe you're working on the OpenXR runtime side and would have more info here.
current_frame_uuids.insert(XrUuidToBaseUuid(state.submeshId).AsLowercaseString());| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
submesh_poses.push_back(state.submeshPoseInBaseSpace);Adam Ren (xWF)Related to a previous comment I've given on a CL further down the stack: It's not clear to me from reading https://developer.android.com/develop/xr/openxr/extensions/XR_ANDROID_scene_meshing#XrSceneMeshSnapshotCreationResultANDROID if the `lastUpdatedTime` describes the pose _and_ the mesh data, or just the mesh data itself. I'm somewhat inclined to believe that, given that pose is reported along with the update time, that the pose might update with every query regardless of if `lastUpdatedTime` has changed. Would love to get some confirmation on that.
If the pose _can_ update independent of the `lastUpdateTime` then the current code would potentially cause the tracking of the meshes to drift when the mesh itself is unchanged.
Thanks for flagging this, Brandon. My understanding is that lastUpdatedTime reflects any change to the submesh — both geometry and pose — since the pose is an intrinsic property of the submesh entity. If pose updates were not reflected in lastUpdatedTime, there would be no mechanism for the app to detect pose-only changes at all, which would be a fundamental API design issue.
I think a pose-only lastUpdatedTime bump would correct drift rather than cause it, since it signals that the runtime has repositioned the mesh — e.g. after a loop closure or relocalization event.
@niha...@google.com @tor...@google.com Could you confirm this? Specifically, does it get bumped when only the pose changes, or only when the geometry (vertices/indices) changes? I believe you're working on the OpenXR runtime side and would have more info here.
Confirmed from Salar Khan who implemented XR_ANDROID_scene_meshing. More details can be found in GChat thread
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
submeshes.push_back(submesh);
submesh_poses.push_back(state.submeshPoseInBaseSpace);nit: every push_back operation will have to reallocate memory and copy old data. Suggest first filling an std::list<size_t> which needs update, then reserving memory for the vectors and populating them in a separate loop.
std::vector<std::vector<XrVector3f>> positions_buffers(submeshes.size());
std::vector<std::vector<XrVector3f>> normals_buffers(submeshes.size());
std::vector<std::vector<uint32_t>> indices_buffers(submeshes.size());
std::vector<std::vector<uint8_t>> semantics_buffers(submeshes.size());
for (size_t i = 0; i < submeshes.size(); ++i) {
auto& submesh = submeshes[i];
if (submesh.vertexCountOutput > 0) {
submesh.vertexCapacityInput = submesh.vertexCountOutput;
positions_buffers[i].resize(submesh.vertexCapacityInput);
normals_buffers[i].resize(submesh.vertexCapacityInput);
semantics_buffers[i].resize(submesh.vertexCapacityInput);
submesh.vertexPositions = positions_buffers[i].data();
submesh.vertexNormals = normals_buffers[i].data();
submesh.vertexSemantics = semantics_buffers[i].data();
} else {
submesh.vertexCapacityInput = 0;
submesh.vertexPositions = nullptr;
submesh.vertexNormals = nullptr;
submesh.vertexSemantics = nullptr;
}
if (submesh.indexCountOutput > 0) {
submesh.indexCapacityInput = submesh.indexCountOutput;
indices_buffers[i].resize(submesh.indexCapacityInput);
submesh.indices = indices_buffers[i].data();
} else {
submesh.indexCapacityInput = 0;
submesh.indices = nullptr;
}
}nit: fragmented buffers can have performance impact. You could just determine the total vertex and index count, and make each entry in submeshes point to an offset in the buffer.
const auto& positions = positions_buffers[idx];
mesh->vertices.clear();
mesh->vertices.reserve(positions.size() * 3);
for (const auto& pos : positions) {
mesh->vertices.push_back(pos.x);
mesh->vertices.push_back(pos.y);
mesh->vertices.push_back(pos.z);
}
// Indices
if (submesh.indexCountOutput > 0) {
const auto& indices = indices_buffers[idx];
mesh->indices.assign(indices.begin(), indices.end());
} else {
mesh->indices.clear();
mesh->indices.reserve(positions.size());
for (uint32_t j = 0; j < positions.size(); ++j) {
mesh->indices.push_back(j);
}
}
// Semantic label
if (!semantics_buffers[idx].empty()) {
uint8_t first_val = semantics_buffers[idx][0];
auto label_enum = static_cast<XrSceneMeshSemanticLabelANDROID>(first_val);
mesh->semantic_label = ToMojomSemanticLabel(label_enum);
}Any reason you are first using a staging buffer to fetch the data and then copying to the final buffer?
Why not first prepare mojom::XRMeshData from the submesh states, allocate the buffers, and then use the buffer pointers in xrGetSubmeshData.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
const auto& positions = positions_buffers[idx];
mesh->vertices.clear();
mesh->vertices.reserve(positions.size() * 3);
for (const auto& pos : positions) {
mesh->vertices.push_back(pos.x);
mesh->vertices.push_back(pos.y);
mesh->vertices.push_back(pos.z);
}
// Indices
if (submesh.indexCountOutput > 0) {
const auto& indices = indices_buffers[idx];
mesh->indices.assign(indices.begin(), indices.end());
} else {
mesh->indices.clear();
mesh->indices.reserve(positions.size());
for (uint32_t j = 0; j < positions.size(); ++j) {
mesh->indices.push_back(j);
}
}
// Semantic label
if (!semantics_buffers[idx].empty()) {
uint8_t first_val = semantics_buffers[idx][0];
auto label_enum = static_cast<XrSceneMeshSemanticLabelANDROID>(first_val);
mesh->semantic_label = ToMojomSemanticLabel(label_enum);
}Any reason you are first using a staging buffer to fetch the data and then copying to the final buffer?
Why not first prepare mojom::XRMeshData from the submesh states, allocate the buffers, and then use the buffer pointers in xrGetSubmeshData.
Hi, Salar, the suggestion works for indices only, doesn't work cleanly for vertices due to type mismatch, and is structurally inapplicable to normals and semantics. I think the current buffer-based approach is the correct design given the incompatibility between OpenXR's C struct types and Mojo's flat float arrays.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
submeshes.push_back(submesh);
submesh_poses.push_back(state.submeshPoseInBaseSpace);nit: every push_back operation will have to reallocate memory and copy old data. Suggest first filling an std::list<size_t> which needs update, then reserving memory for the vectors and populating them in a separate loop.
Done
std::vector<std::vector<XrVector3f>> positions_buffers(submeshes.size());nit: fragmented buffers can have performance impact. You could just determine the total vertex and index count, and make each entry in submeshes point to an offset in the buffer.
Done
const auto& positions = positions_buffers[idx];
mesh->vertices.clear();
mesh->vertices.reserve(positions.size() * 3);
for (const auto& pos : positions) {
mesh->vertices.push_back(pos.x);
mesh->vertices.push_back(pos.y);
mesh->vertices.push_back(pos.z);
}
// Indices
if (submesh.indexCountOutput > 0) {
const auto& indices = indices_buffers[idx];
mesh->indices.assign(indices.begin(), indices.end());
} else {
mesh->indices.clear();
mesh->indices.reserve(positions.size());
for (uint32_t j = 0; j < positions.size(); ++j) {
mesh->indices.push_back(j);
}
}
// Semantic label
if (!semantics_buffers[idx].empty()) {
uint8_t first_val = semantics_buffers[idx][0];
auto label_enum = static_cast<XrSceneMeshSemanticLabelANDROID>(first_val);
mesh->semantic_label = ToMojomSemanticLabel(label_enum);
}Adam Ren (xWF)Any reason you are first using a staging buffer to fetch the data and then copying to the final buffer?
Why not first prepare mojom::XRMeshData from the submesh states, allocate the buffers, and then use the buffer pointers in xrGetSubmeshData.
Hi, Salar, the suggestion works for indices only, doesn't work cleanly for vertices due to type mismatch, and is structurally inapplicable to normals and semantics. I think the current buffer-based approach is the correct design given the incompatibility between OpenXR's C struct types and Mojo's flat float arrays.
Acknowledged. Yes I know conversions can add bottlenecks. In future I suggest to copy the raw CPU buffer to GPU buffer and perform conversions using compute shader.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
std::list<size_t> indices_needing_update;nit: call this `state_indices_of_meshes_to_update` or something similar, because just using `indices` makes it confusing whether this refers to index of submesh states array or index buffer.
indices_needing_update.push_back(i);nit: use emplace_back.
indices_needing_update.push_back(i);nit: use emplace_back.
// Point each submesh's data fields to its slice of the flat buffers, and
// record each submesh's starting offset for use in the conversion loop.
std::vector<uint32_t> vertex_offsets(submeshes.size());
std::vector<uint32_t> index_offsets(submeshes.size());these might not be needed because in the loop of conversion to mojo structures you can just reference the mesh data buffer of each `XrSceneSubmeshDataANDROID` element in `submeshes`.
const auto& pos = positions_buffer[v_offset + j];you could just do
`const XrVector3f& pos = submesh.vertexPositions[j];`
if (submesh.indexCountOutput > 0) {tbh: you don't really want to use up memory of a mesh with zero triangle. You could also add this to the skip logic
if (v_count > 0) {nit: you already have a skip logic at begining of loop if vertex count is zero. This is unnecessary.
if (submesh.indexCountOutput > 0) {
const uint32_t i_offset = index_offsets[idx];
mesh->indices.assign(indices_buffer.begin() + i_offset,
indices_buffer.begin() + i_offset +
submesh.indexCountOutput);
} else {
mesh->indices.reserve(v_count);
for (uint32_t j = 0; j < v_count; ++j) {
mesh->indices.push_back(j);
}
}
// Semantic label
if (v_count > 0) {
uint8_t first_val = semantics_buffer[v_offset];
auto label_enum = static_cast<XrSceneMeshSemanticLabelANDROID>(first_val);
mesh->semantic_label = ToMojomSemanticLabel(label_enum);
}same scenario here. No need of offset array. Each `XrSceneSubmeshDataANDROID` already has pointer to the buffer where their data exists.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
const auto& positions = positions_buffers[idx];
mesh->vertices.clear();
mesh->vertices.reserve(positions.size() * 3);
for (const auto& pos : positions) {
mesh->vertices.push_back(pos.x);
mesh->vertices.push_back(pos.y);
mesh->vertices.push_back(pos.z);
}
// Indices
if (submesh.indexCountOutput > 0) {
const auto& indices = indices_buffers[idx];
mesh->indices.assign(indices.begin(), indices.end());
} else {
mesh->indices.clear();
mesh->indices.reserve(positions.size());
for (uint32_t j = 0; j < positions.size(); ++j) {
mesh->indices.push_back(j);
}
}
// Semantic label
if (!semantics_buffers[idx].empty()) {
uint8_t first_val = semantics_buffers[idx][0];
auto label_enum = static_cast<XrSceneMeshSemanticLabelANDROID>(first_val);
mesh->semantic_label = ToMojomSemanticLabel(label_enum);
}Any reason you are first using a staging buffer to fetch the data and then copying to the final buffer?
Why not first prepare mojom::XRMeshData from the submesh states, allocate the buffers, and then use the buffer pointers in xrGetSubmeshData.
Salar KhanHi, Salar, the suggestion works for indices only, doesn't work cleanly for vertices due to type mismatch, and is structurally inapplicable to normals and semantics. I think the current buffer-based approach is the correct design given the incompatibility between OpenXR's C struct types and Mojo's flat float arrays.
Acknowledged. Yes I know conversions can add bottlenecks. In future I suggest to copy the raw CPU buffer to GPU buffer and perform conversions using compute shader.
I don't quite understand the type mismatch. XrVector3f is just three floats so a vector of XrVector3f should be laid out the same as std::vector<float> right?
yes. You can safely cast a XrVector3f to a float[3], or even an array of XrVector3f to a float array.
I'm not sure the complication of the flat array is worth it TBH. Otherwise, we should be using base::span more and the actual MeshId type and Id allocator.
uint64_t GetOrCreateMeshId(const XrUuid& uuid);We should probably be using MeshId here no? Or at the very least another strongly-typed alias.
mojom::XRSemanticLabel ToMojomSemanticLabel(Similar to the blink side of things should probably move https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/openxr_spatial_plane_manager.cc;drc=9069338c5919cf4fa4f75f17d660a42903765187;l=32 to https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/openxr_util.h or a similar shared spot.
XrSceneMeshingTrackerCreateInfoANDROID create_info{
XR_TYPE_SCENE_MESHING_TRACKER_CREATE_INFO_ANDROID};
create_info.semanticLabelSet =
XR_SCENE_MESH_SEMANTIC_LABEL_SET_DEFAULT_ANDROID;
create_info.enableNormals = XR_TRUE;Consider using an initializer list e.g.:
XrSceneMeshingTrackerCreateInfoANDROID create_info {
.type = ,
.semanticLabelSet= ...,
.enableNormals = XR_TRUE,
};
mesh_tracker_ = XR_NULL_HANDLE;
}
submesh_cache_.clear();
uuid_to_id_.clear();
last_valid_mesh_data_.reset();None of this is probably strictly necessary in the destructor, unless there's some ordering or other concerns that should be documented.
uint64_t new_id = next_mesh_id_++;We should be using hte MeshId type and a generator and returning that value.
DVLOG(2) << "Created new mesh ID " << new_id << " for UUID " << uuid_str;3
mojom::XRMeshDetectionDataPtr OpenXrMeshManagerAndroid::GetDetectedMeshesData(Based on the length, it seems like maybe this could/should be broken up somehow?
XrPosef bbox_center{};
bbox_center.position = {0.0f, 0.0f, 0.0f};
bbox_center.orientation = {0.0f, 0.0f, 0.0f, 1.0f};Can just use `PoseIdentity()` here: https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/openxr_util.h;drc=9069338c5919cf4fa4f75f17d660a42903765187;l=60
XrPosef bbox_center{};
bbox_center.position = {0.0f, 0.0f, 0.0f};
bbox_center.orientation = {0.0f, 0.0f, 0.0f, 1.0f};
XrBoxf bounding_box;
bounding_box.center = bbox_center;
bounding_box.extents = {kUnboundedSceneHalfExtent, kUnboundedSceneHalfExtent,
kUnboundedSceneHalfExtent};Use an initializer list; only chrome style is to match the order of the members.
size_t start_idx = 0;
size_t end_idx = static_cast<size_t>(submesh_count);
if constexpr (kEnableBatchProcessing) {
start_idx = mesh_processing_offset_;
end_idx = std::min(start_idx + kBatchSize,
static_cast<size_t>(submesh_count));
mesh_processing_offset_ = (end_idx == submesh_count) ? 0 : end_idx;
}
DVLOG(2) << "Processing meshes [" << start_idx << ", " << end_idx
<< ") of " << submesh_count;Chrome style generally frowns on e.g. `_idx` suffixes; but I also think using base::span is cleaner here. You can do something like:
```
auto states_span = base::span(states);
if constexpr (kEnableBatchProcessing) {
uint32_t mesh_process_count = std::min(kBatchSize, submesh_count - mesh_processing_offset_);
states_span = states_span.subspan(mesh_processing_offset_, mesh_process_count);
// Can do another variation on the ternary instead
mesh_processing_offset_ += kBatchSize;
if (mesh_processing_offset_ >= submesh_count_) {
mesh_processing_offset_ = 0;
}
```
This allows you to flip to the preferred range-based for loop below.
Though, if we don't think we'd ever turn on the batch-based processing it may be simpler to just remove it.
auto cache_it = submesh_cache_.find(uuid_str);Seems like a good use for
```
auto* mesh = base::FindOrNull(submesh_cache_, uuid_str);
if (!mesh) {
mesh = &submesh_cache_[uuid_str];
mesh.mesh_id = GetOrCreateMeshId(state.submeshId);
}
if (mesh.last_updated_time != state.lastUpdatedTime) {
...
}
```
// Second pass: now that the count is known, reserve once and populate.I don't think this is a good approach. I think it'd be more efficient to only iterate through the list once, it seems like the entire purpose of the first iteration is just to avoid vector resizes as you go along. If it truly becomes an issue you could probably do some heurisitc to estimate an initial capacity for `submeshes` and `submesh_poses` (the total size in the worst case or maybe something like 0.5-0.75)
submesh.vertexCapacityInput = 0;
submesh.vertexCountOutput = 0;
submesh.indexCapacityInput = 0;
submesh.indexCountOutput = 0;
submesh.vertexPositions = nullptr;
submesh.vertexNormals = nullptr;
submesh.vertexSemantics = nullptr;
submesh.indices = nullptr;ISTM that these should be most of the defaults?
if (end_idx == submesh_count) {Not clear why only in this case?
for (const auto& [uuid_str, cache_entry] : submesh_cache_) {NIT: `_` if unused I believe is acceptable
extension_helper_->ExtensionMethods().xrDestroySceneMeshSnapshotANDROID(
snapshot);Use https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/scoped_openxr_object.h or absl::Cleanup to ensure this gets cleaned up rather than this easily skippable idion.
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
// Point each submesh's data fields to its slice of the flat buffers, and
// record each submesh's starting offset for use in the conversion loop.
std::vector<uint32_t> vertex_offsets(submeshes.size());
std::vector<uint32_t> index_offsets(submeshes.size());these might not be needed because in the loop of conversion to mojo structures you can just reference the mesh data buffer of each `XrSceneSubmeshDataANDROID` element in `submeshes`.
How long is the lifetime of that data buffer guaranteed? I guess creating the mojo will necessarily have to do a copy anyway, so maybe it doesn't matter.
UNSAFE_BUFFERS(
submesh.vertexPositions = positions_buffer.data() + vertex_offset;
submesh.vertexNormals = normals_buffer.data() + vertex_offset;
submesh.vertexSemantics = semantics_buffer.data() + vertex_offset;)I believe base::span's subspan would be beneficial here. You can simply do (for each one) something like:
positions_buffer = positions_buffer.subspan(vertex_offset); and it'll basically advance the start of the array for you. You can then just call `positions_buffer.data()` as needed. This avoids the need for the `SAFETY` comment and adds the bounds checks.
I think `vertex_offset` doesn't become an accumulation but where you do the accumulation now is where you'd do the subspan (or maybe for the flat array you still need it).
UNSAFE_BUFFERS(
submesh.indices = indices_buffer.data() + index_offset;)
index_offset += submesh.indexCountOutput;Similar comment as above.
Really we should *only* be doing `UNSAFE_BUFFERS` if we need to e.g. turn the resulting array from an OpenXR API into a bounds-checked type.
In the past the security/memory safety team has not really been a fan of such casts, it'd be best to run it by one of them.
mesh->id = device::MeshId(GetOrCreateMeshId(submesh.submeshId));GetOrCreateMeshId should return a mesh ID and not a uint64_t.
XrPosef xr_pose = submesh_poses[idx];
gfx::Point3F position(xr_pose.position.x, xr_pose.position.y,
xr_pose.position.z);
gfx::Quaternion orientation(xr_pose.orientation.x, xr_pose.orientation.y,
xr_pose.orientation.z, xr_pose.orientation.w);
mesh->mojo_from_mesh = device::Pose(position, orientation);`mesh->mojo_from_mesh = XrPoseToDevicePose(submesh_poses[idx]);`
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
We should probably be using MeshId here no? Or at the very least another strongly-typed alias.
Done
Similar to the blink side of things should probably move https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/openxr_spatial_plane_manager.cc;drc=9069338c5919cf4fa4f75f17d660a42903765187;l=32 to https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/openxr_util.h or a similar shared spot.
Done
XrSceneMeshingTrackerCreateInfoANDROID create_info{
XR_TYPE_SCENE_MESHING_TRACKER_CREATE_INFO_ANDROID};
create_info.semanticLabelSet =
XR_SCENE_MESH_SEMANTIC_LABEL_SET_DEFAULT_ANDROID;
create_info.enableNormals = XR_TRUE;Consider using an initializer list e.g.:
XrSceneMeshingTrackerCreateInfoANDROID create_info {
.type = ,
.semanticLabelSet= ...,
.enableNormals = XR_TRUE,
};
Done
mesh_tracker_ = XR_NULL_HANDLE;
}
submesh_cache_.clear();
uuid_to_id_.clear();
last_valid_mesh_data_.reset();None of this is probably strictly necessary in the destructor, unless there's some ordering or other concerns that should be documented.
Thanks. It has been removed.
We should be using hte MeshId type and a generator and returning that value.
Done
DVLOG(2) << "Created new mesh ID " << new_id << " for UUID " << uuid_str;Adam Ren (xWF)3
Done
mojom::XRMeshDetectionDataPtr OpenXrMeshManagerAndroid::GetDetectedMeshesData(Based on the length, it seems like maybe this could/should be broken up somehow?
Done
XrPosef bbox_center{};
bbox_center.position = {0.0f, 0.0f, 0.0f};
bbox_center.orientation = {0.0f, 0.0f, 0.0f, 1.0f};Can just use `PoseIdentity()` here: https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/openxr_util.h;drc=9069338c5919cf4fa4f75f17d660a42903765187;l=60
Done
XrPosef bbox_center{};
bbox_center.position = {0.0f, 0.0f, 0.0f};
bbox_center.orientation = {0.0f, 0.0f, 0.0f, 1.0f};
XrBoxf bounding_box;
bounding_box.center = bbox_center;
bounding_box.extents = {kUnboundedSceneHalfExtent, kUnboundedSceneHalfExtent,
kUnboundedSceneHalfExtent};Use an initializer list; only chrome style is to match the order of the members.
Done
size_t start_idx = 0;
size_t end_idx = static_cast<size_t>(submesh_count);
if constexpr (kEnableBatchProcessing) {
start_idx = mesh_processing_offset_;
end_idx = std::min(start_idx + kBatchSize,
static_cast<size_t>(submesh_count));
mesh_processing_offset_ = (end_idx == submesh_count) ? 0 : end_idx;
}
DVLOG(2) << "Processing meshes [" << start_idx << ", " << end_idx
<< ") of " << submesh_count;Chrome style generally frowns on e.g. `_idx` suffixes; but I also think using base::span is cleaner here. You can do something like:
```
auto states_span = base::span(states);
if constexpr (kEnableBatchProcessing) {
uint32_t mesh_process_count = std::min(kBatchSize, submesh_count - mesh_processing_offset_);
states_span = states_span.subspan(mesh_processing_offset_, mesh_process_count);// Can do another variation on the ternary instead
mesh_processing_offset_ += kBatchSize;
if (mesh_processing_offset_ >= submesh_count_) {
mesh_processing_offset_ = 0;
}
```This allows you to flip to the preferred range-based for loop below.
Though, if we don't think we'd ever turn on the batch-based processing it may be simpler to just remove it.
Sure, I have removed the batch-based processing as it doesn't give good user experience. We should use bbox-based and LOD-based query when they are available for future optimization
nit: call this `state_indices_of_meshes_to_update` or something similar, because just using `indices` makes it confusing whether this refers to index of submesh states array or index buffer.
This variable was removed in a subsequent refactoring that simplified the loop to a single pass.
Seems like a good use for
```
auto* mesh = base::FindOrNull(submesh_cache_, uuid_str);
if (!mesh) {
mesh = &submesh_cache_[uuid_str];
mesh.mesh_id = GetOrCreateMeshId(state.submeshId);
}if (mesh.last_updated_time != state.lastUpdatedTime) {
...
}
```
Done
indices_needing_update.push_back(i);Adam Ren (xWF)nit: use emplace_back.
Done
indices_needing_update.push_back(i);Adam Ren (xWF)nit: use emplace_back.
Done
// Second pass: now that the count is known, reserve once and populate.I don't think this is a good approach. I think it'd be more efficient to only iterate through the list once, it seems like the entire purpose of the first iteration is just to avoid vector resizes as you go along. If it truly becomes an issue you could probably do some heurisitc to estimate an initial capacity for `submeshes` and `submesh_poses` (the total size in the worst case or maybe something like 0.5-0.75)
I have eliminated the two-pass approach in an earlier comment. The current code is a single loop that builds submeshes and submesh_poses directly:
submesh.vertexCapacityInput = 0;
submesh.vertexCountOutput = 0;
submesh.indexCapacityInput = 0;
submesh.indexCountOutput = 0;
submesh.vertexPositions = nullptr;
submesh.vertexNormals = nullptr;
submesh.vertexSemantics = nullptr;
submesh.indices = nullptr;ISTM that these should be most of the defaults?
Done
Not clear why only in this case?
the guard was removed in the same step where I removed batch processing. The stale cache cleanup now always runs unconditionally.
for (const auto& [uuid_str, cache_entry] : submesh_cache_) {NIT: `_` if unused I believe is acceptable
Done
extension_helper_->ExtensionMethods().xrDestroySceneMeshSnapshotANDROID(
snapshot);Use https://source.chromium.org/chromium/chromium/src/+/main:device/vr/openxr/scoped_openxr_object.h or absl::Cleanup to ensure this gets cleaned up rather than this easily skippable idion.
Done
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
In practice submesh counts range from ~300 to 1000+, growing as the scanned region expands. Salar suggested in an earlier patchset to use flat contiguous buffers specifically to avoid fragmented allocations at that scale, so keeping the flat buffer approach seems justified here.
// Point each submesh's data fields to its slice of the flat buffers, and
// record each submesh's starting offset for use in the conversion loop.
std::vector<uint32_t> vertex_offsets(submeshes.size());
std::vector<uint32_t> index_offsets(submeshes.size());Alexander Cooperthese might not be needed because in the loop of conversion to mojo structures you can just reference the mesh data buffer of each `XrSceneSubmeshDataANDROID` element in `submeshes`.
How long is the lifetime of that data buffer guaranteed? I guess creating the mojo will necessarily have to do a copy anyway, so maybe it doesn't matter.
Removed vertex_offsets and index_offsets from SubmeshGeometry and updated the conversion loop to use base::span directly.
UNSAFE_BUFFERS(
submesh.vertexPositions = positions_buffer.data() + vertex_offset;
submesh.vertexNormals = normals_buffer.data() + vertex_offset;
submesh.vertexSemantics = semantics_buffer.data() + vertex_offset;)I believe base::span's subspan would be beneficial here. You can simply do (for each one) something like:
positions_buffer = positions_buffer.subspan(vertex_offset); and it'll basically advance the start of the array for you. You can then just call `positions_buffer.data()` as needed. This avoids the need for the `SAFETY` comment and adds the bounds checks.
I think `vertex_offset` doesn't become an accumulation but where you do the accumulation now is where you'd do the subspan (or maybe for the flat array you still need it).
Replaced the raw data() + offset pointer arithmetic with four base::span cursors over the flat buffers.
UNSAFE_BUFFERS(
submesh.indices = indices_buffer.data() + index_offset;)
index_offset += submesh.indexCountOutput;Similar comment as above.
Really we should *only* be doing `UNSAFE_BUFFERS` if we need to e.g. turn the resulting array from an OpenXR API into a bounds-checked type.
Done
const auto& positions = positions_buffers[idx];
mesh->vertices.clear();
mesh->vertices.reserve(positions.size() * 3);
for (const auto& pos : positions) {
mesh->vertices.push_back(pos.x);
mesh->vertices.push_back(pos.y);
mesh->vertices.push_back(pos.z);
}
// Indices
if (submesh.indexCountOutput > 0) {
const auto& indices = indices_buffers[idx];
mesh->indices.assign(indices.begin(), indices.end());
} else {
mesh->indices.clear();
mesh->indices.reserve(positions.size());
for (uint32_t j = 0; j < positions.size(); ++j) {
mesh->indices.push_back(j);
}
}
// Semantic label
if (!semantics_buffers[idx].empty()) {
uint8_t first_val = semantics_buffers[idx][0];
auto label_enum = static_cast<XrSceneMeshSemanticLabelANDROID>(first_val);
mesh->semantic_label = ToMojomSemanticLabel(label_enum);
}Adam Ren (xWF)Any reason you are first using a staging buffer to fetch the data and then copying to the final buffer?
Why not first prepare mojom::XRMeshData from the submesh states, allocate the buffers, and then use the buffer pointers in xrGetSubmeshData.
Salar KhanHi, Salar, the suggestion works for indices only, doesn't work cleanly for vertices due to type mismatch, and is structurally inapplicable to normals and semantics. I think the current buffer-based approach is the correct design given the incompatibility between OpenXR's C struct types and Mojo's flat float arrays.
David LiAcknowledged. Yes I know conversions can add bottlenecks. In future I suggest to copy the raw CPU buffer to GPU buffer and perform conversions using compute shader.
Salar KhanI don't quite understand the type mismatch. XrVector3f is just three floats so a vector of XrVector3f should be laid out the same as std::vector<float> right?
Alexander Cooperyes. You can safely cast a XrVector3f to a float[3], or even an array of XrVector3f to a float array.
In the past the security/memory safety team has not really been a fan of such casts, it'd be best to run it by one of them.
Thanks, based on the safety team's preference. I will keep the mojom conversion seperate.
mesh->id = device::MeshId(GetOrCreateMeshId(submesh.submeshId));GetOrCreateMeshId should return a mesh ID and not a uint64_t.
Done
XrPosef xr_pose = submesh_poses[idx];
gfx::Point3F position(xr_pose.position.x, xr_pose.position.y,
xr_pose.position.z);
gfx::Quaternion orientation(xr_pose.orientation.x, xr_pose.orientation.y,
xr_pose.orientation.z, xr_pose.orientation.w);
mesh->mojo_from_mesh = device::Pose(position, orientation);`mesh->mojo_from_mesh = XrPoseToDevicePose(submesh_poses[idx]);`
Done
you could just do
`const XrVector3f& pos = submesh.vertexPositions[j];`
Done
tbh: you don't really want to use up memory of a mesh with zero triangle. You could also add this to the skip logic
Done
nit: you already have a skip logic at begining of loop if vertex count is zero. This is unnecessary.
Done
if (submesh.indexCountOutput > 0) {
const uint32_t i_offset = index_offsets[idx];
mesh->indices.assign(indices_buffer.begin() + i_offset,
indices_buffer.begin() + i_offset +
submesh.indexCountOutput);
} else {
mesh->indices.reserve(v_count);
for (uint32_t j = 0; j < v_count; ++j) {
mesh->indices.push_back(j);
}
}
// Semantic label
if (v_count > 0) {
uint8_t first_val = semantics_buffer[v_offset];
auto label_enum = static_cast<XrSceneMeshSemanticLabelANDROID>(first_val);
mesh->semantic_label = ToMojomSemanticLabel(label_enum);
}same scenario here. No need of offset array. Each `XrSceneSubmeshDataANDROID` already has pointer to the buffer where their data exists.
Thanks Salar, index_offsets and vertex_offsets arrays were removed. The conversion loop now uses base::span(submesh.indices, submesh.indexCountOutput) and base::span(submesh.vertexSemantics, submesh.vertexCountOutput) directly
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
XrBoxf bounding_box = {bounding box extents refers to the actual width and height of the bounding box, not the half extent (absolute vector from box origin to a corner in box local space).
So you probably need to multiply kUnboundedSceneHalfExtent with 2. Check the definition of XrBoxf:
https://registry.khronos.org/OpenXR/specs/1.1/man/html/XrBoxf.html
std::vector<XrSceneSubmeshDataANDROID> submeshes;
std::vector<XrPosef> submesh_poses;
for (const auto& state : states) {
std::string uuid_str = XrUuidToString(state.submeshId);
auto* entry = base::FindOrNull(submesh_cache_, uuid_str);
if (!entry) {
entry = &submesh_cache_[uuid_str];
entry->mesh_id = GetOrCreateMeshId(state.submeshId);
DVLOG(2) << "New submesh detected";
} else if (entry->last_updated_time == state.lastUpdatedTime) {
continue;
} else {
DVLOG(2) << "Submesh updated";
}
entry->last_updated_time = state.lastUpdatedTime;
submeshes.push_back({
.type = XR_TYPE_SCENE_SUBMESH_DATA_ANDROID,
.submeshId = state.submeshId,
});
submesh_poses.push_back(state.submeshPoseInBaseSpace);
}nit: Every push_back call here reallocates the underlying vector data and copies existing data to new location which has average O(n^2) runtime.
Please bring back the old code of multipass where you first create a needs update std::list<XrUuid> and then reserve the memory for submesh and submesh_poses data, and populate then in a second pass loop. You could also incorporate it with the uuid update below where you create current_frame_uuids in parallel with the needs_update list. Also the more appropriate way to do it is rather than creating a temporary flash_hash_set, you could just mark every entry in submesh_cache_ for delete initially before getting submesh states, and then after getting the submesh states and iterating the states uuid, you update the tag to unmark the ones that still exist in the frame. This way you can delete the entries from the cache that are marked for deletion.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |
std::vector<XrSceneSubmeshDataANDROID> submeshes;
std::vector<XrPosef> submesh_poses;
for (const auto& state : states) {
std::string uuid_str = XrUuidToString(state.submeshId);
auto* entry = base::FindOrNull(submesh_cache_, uuid_str);
if (!entry) {
entry = &submesh_cache_[uuid_str];
entry->mesh_id = GetOrCreateMeshId(state.submeshId);
DVLOG(2) << "New submesh detected";
} else if (entry->last_updated_time == state.lastUpdatedTime) {
continue;
} else {
DVLOG(2) << "Submesh updated";
}
entry->last_updated_time = state.lastUpdatedTime;
submeshes.push_back({
.type = XR_TYPE_SCENE_SUBMESH_DATA_ANDROID,
.submeshId = state.submeshId,
});
submesh_poses.push_back(state.submeshPoseInBaseSpace);
}nit: Every push_back call here reallocates the underlying vector data and copies existing data to new location which has average O(n^2) runtime.
Please bring back the old code of multipass where you first create a needs update std::list<XrUuid> and then reserve the memory for submesh and submesh_poses data, and populate then in a second pass loop. You could also incorporate it with the uuid update below where you create current_frame_uuids in parallel with the needs_update list. Also the more appropriate way to do it is rather than creating a temporary flash_hash_set, you could just mark every entry in submesh_cache_ for delete initially before getting submesh states, and then after getting the submesh states and iterating the states uuid, you update the tag to unmark the ones that still exist in the frame. This way you can delete the entries from the cache that are marked for deletion.
I think reading through the list twice is also going to be expensive, especially to solely store an index to do access over. I think it'd be sufficient to do some heuristic to reserve an appropriate amount of space accordingly to reduce the resize calls. (Even if that heuristic is "assume all submeshes will be updated").
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.Adam Ren (xWF)How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
In practice submesh counts range from ~300 to 1000+, growing as the scanned region expands. Salar suggested in an earlier patchset to use flat contiguous buffers specifically to avoid fragmented allocations at that scale, so keeping the flat buffer approach seems justified here.
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.Adam Ren (xWF)How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
Alexander CooperIn practice submesh counts range from ~300 to 1000+, growing as the scanned region expands. Salar suggested in an earlier patchset to use flat contiguous buffers specifically to avoid fragmented allocations at that scale, so keeping the flat buffer approach seems justified here.
Can you point me to the comment?
Yeah, it's this one
#318
Salar Khan
Patchset 8
Mar 04
nit: fragmented buffers can have performance impact. You could just determine the total vertex and index count, and make each entry in submeshes point to an offset in the buffer.
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.Adam Ren (xWF)How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
Alexander CooperIn practice submesh counts range from ~300 to 1000+, growing as the scanned region expands. Salar suggested in an earlier patchset to use flat contiguous buffers specifically to avoid fragmented allocations at that scale, so keeping the flat buffer approach seems justified here.
Adam Ren (xWF)Can you point me to the comment?
Yeah, it's this one
#318
Salar Khan
Patchset 8
Mar 04
nit: fragmented buffers can have performance impact. You could just determine the total vertex and index count, and make each entry in submeshes point to an offset in the buffer.
TBH that's a lot weaker of a statement than I would think. Did you see a noticeable performance improvement when you switched to this or was it done in conjunction with a bunch of other changes? How is it consumed on the blink side?
std::vector<XrSceneSubmeshDataANDROID> submeshes;
std::vector<XrPosef> submesh_poses;
for (const auto& state : states) {
std::string uuid_str = XrUuidToString(state.submeshId);
auto* entry = base::FindOrNull(submesh_cache_, uuid_str);
if (!entry) {
entry = &submesh_cache_[uuid_str];
entry->mesh_id = GetOrCreateMeshId(state.submeshId);
DVLOG(2) << "New submesh detected";
} else if (entry->last_updated_time == state.lastUpdatedTime) {
continue;
} else {
DVLOG(2) << "Submesh updated";
}
entry->last_updated_time = state.lastUpdatedTime;
submeshes.push_back({
.type = XR_TYPE_SCENE_SUBMESH_DATA_ANDROID,
.submeshId = state.submeshId,
});
submesh_poses.push_back(state.submeshPoseInBaseSpace);
}Alexander Coopernit: Every push_back call here reallocates the underlying vector data and copies existing data to new location which has average O(n^2) runtime.
Please bring back the old code of multipass where you first create a needs update std::list<XrUuid> and then reserve the memory for submesh and submesh_poses data, and populate then in a second pass loop. You could also incorporate it with the uuid update below where you create current_frame_uuids in parallel with the needs_update list. Also the more appropriate way to do it is rather than creating a temporary flash_hash_set, you could just mark every entry in submesh_cache_ for delete initially before getting submesh states, and then after getting the submesh states and iterating the states uuid, you update the tag to unmark the ones that still exist in the frame. This way you can delete the entries from the cache that are marked for deletion.
I think reading through the list twice is also going to be expensive, especially to solely store an index to do access over. I think it'd be sufficient to do some heuristic to reserve an appropriate amount of space accordingly to reduce the resize calls. (Even if that heuristic is "assume all submeshes will be updated").
Is not the double read that will be expensive, but the heap allocation for every push to std::list. What if we just use std::vector<uint32_t> and reserve it with states.size(), this way we are not wasting too much memory, and we push index of states elements that need new data. And in second pass we iterate this indices vector and use that to get the state which is useful for populating the submeshes and submesh_poses vector?
reserving states.size() for submeshes and submesh_poses will be very expensive and memory wasting since those structs use more memory than simple uint32_t?
Btw the maximum number of submeshes this API will provide is 6250.
continue;VERY IMPORTANT NOTE:
This only tells you the last time buffers were updated, not the pose. The poses of literally all of the submeshes in the base space can change when you recenter your device. So you need to give latest poses as well, otherwise pressing recenter will cause all of the updated rendered mesh to shift since they will be using old pose values.
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.Adam Ren (xWF)How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
Alexander CooperIn practice submesh counts range from ~300 to 1000+, growing as the scanned region expands. Salar suggested in an earlier patchset to use flat contiguous buffers specifically to avoid fragmented allocations at that scale, so keeping the flat buffer approach seems justified here.
Adam Ren (xWF)Can you point me to the comment?
Alexander CooperYeah, it's this one
#318
Salar Khan
Patchset 8
Mar 04
nit: fragmented buffers can have performance impact. You could just determine the total vertex and index count, and make each entry in submeshes point to an offset in the buffer.
TBH that's a lot weaker of a statement than I would think. Did you see a noticeable performance improvement when you switched to this or was it done in conjunction with a bunch of other changes? How is it consumed on the blink side?
calling new multiple times is expensive since the OS has to do work on allocating memory. The maximum total vertices this API will provide is 800000 and total indices 2000000 and total submeshes 6250.
So we can expect maximum 128 vertices and 320 indices per submesh. On average it is much lower though.
However I do agree there is not a huge difference in performance right now so I wouldn't mind use of separate buffers for each submesh data. But there can be performance bottlenecks in the future when this API is scaled to provide higher level of detail meshes and more submeshes.
} else if (entry->last_updated_time == state.lastUpdatedTime) {
continue;
} else {
DVLOG(2) << "Submesh updated";
}
entry->last_updated_time = state.lastUpdatedTime;
submeshes.push_back({
.type = XR_TYPE_SCENE_SUBMESH_DATA_ANDROID,
.submeshId = state.submeshId,
});NIT: A bit more legible IMO if you don't make an if/else chain with the continue and just make a separate statement check the update time once you've guaranteed entry is valid.
std::vector<XrSceneSubmeshDataANDROID> submeshes;
std::vector<XrPosef> submesh_poses;
for (const auto& state : states) {
std::string uuid_str = XrUuidToString(state.submeshId);
auto* entry = base::FindOrNull(submesh_cache_, uuid_str);
if (!entry) {
entry = &submesh_cache_[uuid_str];
entry->mesh_id = GetOrCreateMeshId(state.submeshId);
DVLOG(2) << "New submesh detected";
} else if (entry->last_updated_time == state.lastUpdatedTime) {
continue;
} else {
DVLOG(2) << "Submesh updated";
}
entry->last_updated_time = state.lastUpdatedTime;
submeshes.push_back({
.type = XR_TYPE_SCENE_SUBMESH_DATA_ANDROID,
.submeshId = state.submeshId,
});
submesh_poses.push_back(state.submeshPoseInBaseSpace);
}Alexander Coopernit: Every push_back call here reallocates the underlying vector data and copies existing data to new location which has average O(n^2) runtime.
Please bring back the old code of multipass where you first create a needs update std::list<XrUuid> and then reserve the memory for submesh and submesh_poses data, and populate then in a second pass loop. You could also incorporate it with the uuid update below where you create current_frame_uuids in parallel with the needs_update list. Also the more appropriate way to do it is rather than creating a temporary flash_hash_set, you could just mark every entry in submesh_cache_ for delete initially before getting submesh states, and then after getting the submesh states and iterating the states uuid, you update the tag to unmark the ones that still exist in the frame. This way you can delete the entries from the cache that are marked for deletion.
I think reading through the list twice is also going to be expensive, especially to solely store an index to do access over. I think it'd be sufficient to do some heuristic to reserve an appropriate amount of space accordingly to reduce the resize calls. (Even if that heuristic is "assume all submeshes will be updated").
TBH, I think we're in the realm of micro-optimizations/trade-offs here. Creating a list of indices leads to random access of the array which can lead to cache misses and spinning processing time, allocating a full object can take up extra memory that we don't need temporarily (and we can always resize down). In this case, I have a slight preference for what's more readable unless we have concrete benchmarks showing one way is better than the other.
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.Adam Ren (xWF)How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
Alexander CooperIn practice submesh counts range from ~300 to 1000+, growing as the scanned region expands. Salar suggested in an earlier patchset to use flat contiguous buffers specifically to avoid fragmented allocations at that scale, so keeping the flat buffer approach seems justified here.
Adam Ren (xWF)Can you point me to the comment?
Alexander CooperYeah, it's this one
#318
Salar Khan
Patchset 8
Mar 04
nit: fragmented buffers can have performance impact. You could just determine the total vertex and index count, and make each entry in submeshes point to an offset in the buffer.
Salar KhanTBH that's a lot weaker of a statement than I would think. Did you see a noticeable performance improvement when you switched to this or was it done in conjunction with a bunch of other changes? How is it consumed on the blink side?
calling new multiple times is expensive since the OS has to do work on allocating memory. The maximum total vertices this API will provide is 800000 and total indices 2000000 and total submeshes 6250.
So we can expect maximum 128 vertices and 320 indices per submesh. On average it is much lower though.However I do agree there is not a huge difference in performance right now so I wouldn't mind use of separate buffers for each submesh data. But there can be performance bottlenecks in the future when this API is scaled to provide higher level of detail meshes and more submeshes.
Ultimately though I haven't double checked, it's going to depend on how this data is surfaced to the page as well. If it's all submeshes/independent objects to the page, then we may as well allocate it as such here and now and only pay that cost once, and in a spot where it's easier to try to optimize.
XrSceneMeshSemanticLabelANDROID label);I missed in my earlier comment that these were technically two different enums, in that case it'd be fine to leave them as they were; but I'm not going to ask you to flip back at this point.
#if BUILDFLAG(IS_ANDROID)I don't think this needs to be in an `IS_ANDROID` block.
#if BUILDFLAG(IS_ANDROID)Doesn't need to be in an `is_android` block. the file name "xr_android.h" is because the extensions are named "XR_ANDROID_FOO" etc; not because they are necessarily android specific, but because "Android" is the vendor name that they are proposed under.
bounding box extents refers to the actual width and height of the bounding box, not the half extent (absolute vector from box origin to a corner in box local space).
So you probably need to multiply kUnboundedSceneHalfExtent with 2. Check the definition of XrBoxf:
https://registry.khronos.org/OpenXR/specs/1.1/man/html/XrBoxf.html
Done
VERY IMPORTANT NOTE:
This only tells you the last time buffers were updated, not the pose. The poses of literally all of the submeshes in the base space can change when you recenter your device. So you need to give latest poses as well, otherwise pressing recenter will cause all of the updated rendered mesh to shift since they will be using old pose values.
Thanks Salar, it has been corrected.
} else if (entry->last_updated_time == state.lastUpdatedTime) {
continue;
} else {
DVLOG(2) << "Submesh updated";
}
entry->last_updated_time = state.lastUpdatedTime;
submeshes.push_back({
.type = XR_TYPE_SCENE_SUBMESH_DATA_ANDROID,
.submeshId = state.submeshId,
});NIT: A bit more legible IMO if you don't make an if/else chain with the continue and just make a separate statement check the update time once you've guaranteed entry is valid.
Done
std::vector<XrSceneSubmeshDataANDROID> submeshes;
std::vector<XrPosef> submesh_poses;
for (const auto& state : states) {
std::string uuid_str = XrUuidToString(state.submeshId);
auto* entry = base::FindOrNull(submesh_cache_, uuid_str);
if (!entry) {
entry = &submesh_cache_[uuid_str];
entry->mesh_id = GetOrCreateMeshId(state.submeshId);
DVLOG(2) << "New submesh detected";
} else if (entry->last_updated_time == state.lastUpdatedTime) {
continue;
} else {
DVLOG(2) << "Submesh updated";
}
entry->last_updated_time = state.lastUpdatedTime;
submeshes.push_back({
.type = XR_TYPE_SCENE_SUBMESH_DATA_ANDROID,
.submeshId = state.submeshId,
});
submesh_poses.push_back(state.submeshPoseInBaseSpace);
}Alexander Coopernit: Every push_back call here reallocates the underlying vector data and copies existing data to new location which has average O(n^2) runtime.
Please bring back the old code of multipass where you first create a needs update std::list<XrUuid> and then reserve the memory for submesh and submesh_poses data, and populate then in a second pass loop. You could also incorporate it with the uuid update below where you create current_frame_uuids in parallel with the needs_update list. Also the more appropriate way to do it is rather than creating a temporary flash_hash_set, you could just mark every entry in submesh_cache_ for delete initially before getting submesh states, and then after getting the submesh states and iterating the states uuid, you update the tag to unmark the ones that still exist in the frame. This way you can delete the entries from the cache that are marked for deletion.
Alexander CooperI think reading through the list twice is also going to be expensive, especially to solely store an index to do access over. I think it'd be sufficient to do some heuristic to reserve an appropriate amount of space accordingly to reduce the resize calls. (Even if that heuristic is "assume all submeshes will be updated").
TBH, I think we're in the realm of micro-optimizations/trade-offs here. Creating a list of indices leads to random access of the array which can lead to cache misses and spinning processing time, allocating a full object can take up extra memory that we don't need temporarily (and we can always resize down). In this case, I have a slight preference for what's more readable unless we have concrete benchmarks showing one way is better than the other.
Thanks for the detailed discussion. I don’t see a clear advantage of one approach over the other in the on-device test, so I’ll keep the current design for now for better readability
// Sum total vertex and index counts so we can allocate one flat, contiguous
// buffer per data type rather than one allocation per submesh.Adam Ren (xWF)How many submeshes are we typically seeing? I'm not sure that the likely slight extra overhead is worth the more complicated logic here to make this a flat array instead of e.g. creating an array of a POD struct of vectors for a submesh (And then we could maybe even just make a typemap to abstract the conversion for mojo over the wire).
Alexander CooperIn practice submesh counts range from ~300 to 1000+, growing as the scanned region expands. Salar suggested in an earlier patchset to use flat contiguous buffers specifically to avoid fragmented allocations at that scale, so keeping the flat buffer approach seems justified here.
Adam Ren (xWF)Can you point me to the comment?
Alexander CooperYeah, it's this one
#318
Salar Khan
Patchset 8
Mar 04
nit: fragmented buffers can have performance impact. You could just determine the total vertex and index count, and make each entry in submeshes point to an offset in the buffer.
Salar KhanTBH that's a lot weaker of a statement than I would think. Did you see a noticeable performance improvement when you switched to this or was it done in conjunction with a bunch of other changes? How is it consumed on the blink side?
Alexander Coopercalling new multiple times is expensive since the OS has to do work on allocating memory. The maximum total vertices this API will provide is 800000 and total indices 2000000 and total submeshes 6250.
So we can expect maximum 128 vertices and 320 indices per submesh. On average it is much lower though.However I do agree there is not a huge difference in performance right now so I wouldn't mind use of separate buffers for each submesh data. But there can be performance bottlenecks in the future when this API is scaled to provide higher level of detail meshes and more submeshes.
Ultimately though I haven't double checked, it's going to depend on how this data is surfaced to the page as well. If it's all submeshes/independent objects to the page, then we may as well allocate it as such here and now and only pay that cost once, and in a spot where it's easier to try to optimize.
Thanks for the discussion. I’ll keep the current flat buffer design for generalization and to scale better if the API later provides more submeshes or higher LOD meshes. The Blink side already consumes this as per-submesh XRMeshData objects after conversion.
I don't think this needs to be in an `IS_ANDROID` block.
Done
Doesn't need to be in an `is_android` block. the file name "xr_android.h" is because the extensions are named "XR_ANDROID_FOO" etc; not because they are necessarily android specific, but because "Android" is the vendor name that they are proposed under.
| Inspect html for hidden footers to help with email filtering. To unsubscribe visit settings. |