[JIRA] (WTES-69) AEM Mocks Build fail sporadically with "Unable to inject mandatory reference" error

225 views
Skip to first unread message

Stefan Seifert (JIRA)

unread,
Nov 19, 2021, 10:20:32 AM11/19/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Seifert updated WTES-69:
-------------------------------
Summary: AEM Mocks Build fail sporadically with "Unable to inject mandatory reference" error (was: AEM Mocks Build fails sporadically with "Unable to inject mandatory reference" error)

> AEM Mocks Build fail sporadically with "Unable to inject mandatory reference" error
> -----------------------------------------------------------------------------------
>
> Key: WTES-69
> URL: https://wcm-io.atlassian.net/browse/WTES-69
> Project: wcm.io Testing
> Issue Type: Bug
> Components: AEM Mocks
> Reporter: Stefan Seifert
> Assignee: Stefan Seifert
>
> from time to time builds for AEM Mocks are failing with errors like:
> {noformat}
> MockComponentContextTest.testGetParent For value JCR_OAK: Unable to inject mandatory reference 'adapterManager' for class org.apache.sling.models.impl.ModelAdapterFactory : no matching services were found.
> {noformat}
> it's not always the same service that is affected. most times it's one of adapterManager, eventAdmin or both.
> recent examples:
> * https://github.com/wcm-io/wcm-io-testing/runs/4265710010?check_suite_focus=true
> * https://github.com/wcm-io/wcm-io-testing/runs/4265557848?check_suite_focus=true
> seems to be a racing condition or a concurrency problem. we are running the junit 5 tests in parallel (see [parent_toplevel config|https://github.com/wcm-io/wcm-io-tooling/blob/6e200cfbdf6e4016771720a5974df1bf66fba2b8/parent_toplevel/pom.xml#L387-L394]).



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100183)

Stefan Seifert (JIRA)

unread,
Nov 19, 2021, 2:40:07 PM11/19/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21387#comment-21387 ]

Stefan Seifert commented on WTES-69:
------------------------------------

A) i played a bit around with two branches - one with parallel testing enabled
[https://github.com/wcm-io/wcm-io-testing/tree/feature/run-with-parallel-tests|https://github.com/wcm-io/wcm-io-testing/tree/feature/run-with-parallel-tests|smart-link]

B) and one with parallel testing disabled
[https://github.com/wcm-io/wcm-io-testing/tree/feature/deactivate-parallel-tests|https://github.com/wcm-io/wcm-io-testing/tree/feature/deactivate-parallel-tests|smart-link]

did 8 runs with each on github - A) succeeded 7 of 8 times, B) succeeded 8 of 8 times.

additionally, i saw no performance benefit in running the tests in parallel - in contrary, it seems the run without parallel testing seems to be sometimes slightly faster…?

Stefan Seifert (JIRA)

unread,
Nov 25, 2021, 10:05:57 AM11/25/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21390#comment-21390 ]

Stefan Seifert commented on WTES-69:
------------------------------------

in the meantime, i’ve switched off the parallel running of unit tests on the develop branch, as it did not have any performance improvements. develop branch compiles quite stable, looks good.

but: the problem still occurs regularly if compiled against latest AEM dependencies. we’ve 4 branches compiling against different AEM dependencies (and the related sling and other 3rdparty bundle versions):

* AEM 6.4.0: [https://github.com/wcm-io/wcm-io-testing/tree/develop|https://github.com/wcm-io/wcm-io-testing/tree/develop]
* AEM 6.5.0: [https://github.com/wcm-io/wcm-io-testing/tree/testing/aem65-deps|https://github.com/wcm-io/wcm-io-testing/tree/testing/aem65-deps]
* AEM 6.5.11: [https://github.com/wcm-io/wcm-io-testing/tree/testing/aem6511-deps|https://github.com/wcm-io/wcm-io-testing/tree/testing/aem6511-deps]
* AEMaaCS: [https://github.com/wcm-io/wcm-io-testing/tree/testing/aemcloud-deps|https://github.com/wcm-io/wcm-io-testing/tree/testing/aemcloud-deps]

i did a couple of test runs, and those for AEM 6.5.11 and AEMaaCS fail quite regularly (~30-50% failure rate from the couple runs i did) - whereas the branches for quite old dependencies (AEM 6.4.0 and AEM 6.5.0) did not show this behavior.

in case this is not just bad luck it looks that this problem is not directly related to a bug e.g. in osgi-mock (that should show up in all branches), but seems to be related to a combination of osgi-mock and some of the latest sling or other 3rdparty bundles. this is a bit strange, as it looks like a racing condition in osgi-mock, where as service which was registered before seems to be not ready when another component depending on it is registered later.

maybe i’ll test with other AEM 6.5.x dependencies between AEM 6.5.0 and 6.5.11 as well to see at which point in time it starts breaking. then we can see what was the difference between those set of dependencies.

Stefan Seifert (JIRA)

unread,
Nov 25, 2021, 2:06:29 PM11/25/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21391#comment-21391 ]

Stefan Seifert commented on WTES-69:
------------------------------------

eventually i was able to reproduce the problem with all service pack levels 6.4.0, 6.5.0 up to 6.5.11 - the older ones just took considerable more attempts to fail.

so i’m coming back to the theory that there is something fishy in the osgi-mock component registry, and it was probably not uncovered before because it failed only on very rare occasions with the older service packs.

Stefan Seifert (JIRA)

unread,
Nov 25, 2021, 2:36:35 PM11/25/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21393#comment-21393 ]

Stefan Seifert commented on WTES-69:
------------------------------------

collections of services that were expected but missing for injection in the various failed test runs:

* {{ 'adapterManager' (org.apache.sling.api.adapter.AdapterManager) for class org.apache.sling.models.impl.ModelAdapterFactory }}
* {{ 'eventAdmin' (org.osgi.service.event.EventAdmin) for class io.wcm.testing.mock.aem.dam.MockAemDamAdapterFactory }}
* {{ 'resourceAccessSecurityTracker' (org.apache.sling.resourceresolver.impl.ResourceAccessSecurityTracker) for class org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator }}
* {{ 'stringInterpolationProvider' (org.apache.sling.resourceresolver.impl.mapping.StringInterpolationProvider) for class org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator }}

Stefan Seifert (JIRA)

unread,
Nov 29, 2021, 10:28:38 AM11/29/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Seifert resolved WTES-69.
--------------------------------
Fix versions: AEM Mocks 4.1.6
Resolution: Fixed

it seem that [SLING-10944|https://issues.apache.org/jira/browse/SLING-10944] was the root cause for this issue. so ti should be fixed with next release for AEM Mocks 4.1.6.

i also enabled again the parallel execution of unit tests for the aem-mock project itself - it runs fine as well (and despite previous observations seems to increase the test run in average a bit by ~20-25%).

> AEM Mocks Build fail sporadically with "Unable to inject mandatory reference" error
> -----------------------------------------------------------------------------------
>
> Key: WTES-69
> URL: https://wcm-io.atlassian.net/browse/WTES-69
> Project: wcm.io Testing
> Issue Type: Bug
> Components: AEM Mocks
> Reporter: Stefan Seifert
> Assignee: Stefan Seifert
> Fix For: AEM Mocks 4.1.6

Stefan Seifert (JIRA)

unread,
Dec 7, 2021, 4:20:16 AM12/7/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Seifert reopened WTES-69:
--------------------------------

i've disocvered that this problem still occurs.

strange thing is that it does not occur when DEBUG logging for osgi-mock is enabled.
it does occur in 30-50% of aem-mock test runs when DEBUG logging is disabled for osgi-mock (does not matter if DEBUG loggin for sling-mock/aem-mock is enabled or not).

i already did some thorough analyses of the actual debug log statements and ifDebugEnabled blocks in osgi-mock and tested with several variations on a branch - it's very spooky, one single debug line makes the difference - if this line is present all tests run fine - if not they fail with high probability:
https://github.com/apache/sling-org-apache-sling-testing-osgi-mock/blob/17245c3847ee00d824c53d35773ba051076f531b/core/src/main/java/org/apache/sling/testing/mock/osgi/MockBundleContext.java#L131

Stefan Seifert (JIRA)

unread,
Dec 7, 2021, 4:22:01 AM12/7/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21406#comment-21406 ]

Stefan Seifert edited comment on WTES-69 at 12/7/21 9:21 AM:
-------------------------------------------------------------

i've disocvered that this problem still occurs.

strange thing is that it does not occur when DEBUG logging for osgi-mock is enabled.
it does occur in 30-50% of aem-mock test runs when DEBUG logging is disabled for osgi-mock (does not matter if DEBUG loggin for sling-mock/aem-mock is enabled or not).

i already did some thorough analyses of the actual debug log statements and ifDebugEnabled blocks in osgi-mock and tested with several variations on a branch - it's very spooky, one single debug line makes the difference - if this line is present all tests run fine - if not they fail with high probability:
https://github.com/apache/sling-org-apache-sling-testing-osgi-mock/blob/17245c3847ee00d824c53d35773ba051076f531b/core/src/main/java/org/apache/sling/testing/mock/osgi/MockBundleContext.java#L131

the actual occurences i've seen are:
* 'stringInterpolationProvider' (org.apache.sling.resourceresolver.impl.mapping.StringInterpolationProvider) for class org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator (for AEM Cloud and AEM 6.11)
* 'resourceAccessSecurityTracker' (org.apache.sling.resourceresolver.impl.ResourceAccessSecurityTracker) for class org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator (for AEM 6.5.0 where stringInterpolationProvider is not present)



was (Author: sseifert):

Stefan Seifert (JIRA)

unread,
Dec 7, 2021, 6:09:00 AM12/7/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21407#comment-21407 ]

Stefan Seifert commented on WTES-69:
------------------------------------

first time i managed to capture a run with logging about a missing StringInterpolationProvider.
from the log it is seen that it is actually registered, but not found as service reference directly afterwards.

{noformat}
2021-12-07T10:16:18.1933969Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Creating MockBundleContext, bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1938572Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.osgi.service.cm.ConfigurationAdmin), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1943415Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.osgi.service.event.EventAdmin), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1948553Z [main] DEBUG org.apache.sling.testing.mock.sling.ThreadsafeMockAdapterManagerWrapper - Set bundle context for AdapterManager, bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1954218Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.api.adapter.AdapterManager), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1959471Z [main] DEBUG io.wcm.testing.mock.aem.context.ContextResourceResolverFactory - Start initialize resource resolver factory, bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1964215Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (java.util.concurrent.Executor), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1969024Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.jcr.api.SlingRepository), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1974635Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.commons.classloader.DynamicClassLoaderManager), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1980609Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.spi.resource.provider.ResourceProvider), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1986468Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.serviceusermapping.ServiceUserMapper), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.1992772Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.resourceresolver.impl.ResourceAccessSecurityTracker), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.2003138Z [pool-481-thread-1] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.serviceusermapping.ServiceUserMapped), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.2009246Z [pool-481-thread-1] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.serviceusermapping.ServiceUserMapped), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.2015812Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Register null (org.apache.sling.resourceresolver.impl.mapping.StringInterpolationProvider), bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.2021806Z [main] ERROR io.wcm.testing.mock.aem.context.ContextResourceResolverFactory - Failed initializing resource resolver factory, bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.2043707Z org.apache.sling.testing.mock.osgi.ReferenceViolationException: Unable to inject mandatory reference 'stringInterpolationProvider' (org.apache.sling.resourceresolver.impl.mapping.StringInterpolationProvider) for class org.apache.sling.resourceresolver.impl.ResourceResolverFactoryActivator : no matching services were found. bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
2021-12-07T10:16:18.2051076Z at org.apache.sling.testing.mock.osgi.OsgiServiceUtil.injectServiceReference(OsgiServiceUtil.java:626)
2021-12-07T10:16:18.2054359Z at org.apache.sling.testing.mock.osgi.OsgiServiceUtil.injectServices(OsgiServiceUtil.java:439)
2021-12-07T10:16:18.2057042Z at org.apache.sling.testing.mock.osgi.MockOsgi.injectServices(MockOsgi.java:163)
2021-12-07T10:16:18.2059389Z at org.apache.sling.testing.mock.osgi.MockOsgi.injectServices(MockOsgi.java:151)
2021-12-07T10:16:18.2064766Z at org.apache.sling.testing.mock.sling.ResourceResolverFactoryInitializer.initializeResourceResolverFactoryActivator(ResourceResolverFactoryInitializer.java:156)
2021-12-07T10:16:18.2071360Z at org.apache.sling.testing.mock.sling.ResourceResolverFactoryInitializer.setUp(ResourceResolverFactoryInitializer.java:81)
2021-12-07T10:16:18.2075227Z at org.apache.sling.testing.mock.sling.MockSling.newResourceResolverFactory(MockSling.java:87)
2021-12-07T10:16:18.2078742Z at io.wcm.testing.mock.aem.context.ContextResourceResolverFactory.get(ContextResourceResolverFactory.java:55)
2021-12-07T10:16:18.2082247Z at io.wcm.testing.mock.aem.context.AemContextImpl.newResourceResolverFactory(AemContextImpl.java:99)
2021-12-07T10:16:18.2086180Z at org.apache.sling.testing.mock.sling.context.SlingContextImpl.resourceResolverFactory(SlingContextImpl.java:151)
2021-12-07T10:16:18.2089900Z at org.apache.sling.testing.mock.sling.context.SlingContextImpl.setUp(SlingContextImpl.java:135)
2021-12-07T10:16:18.2092487Z at io.wcm.testing.mock.aem.context.AemContextImpl.setUp(AemContextImpl.java:104)
2021-12-07T10:16:18.2094459Z at io.wcm.testing.mock.aem.junit.AemContext.access$400(AemContext.java:56)
2021-12-07T10:16:18.2096138Z at io.wcm.testing.mock.aem.junit.AemContext$2.execute(AemContext.java:208)
2021-12-07T10:16:18.2097693Z at io.wcm.testing.mock.aem.junit.AemContext$2.execute(AemContext.java:203)
2021-12-07T10:16:18.2100040Z at io.wcm.testing.junit.rules.parameterized.RepeatedStatement.evaluate(RepeatedStatement.java:55)
2021-12-07T10:16:18.2102432Z at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2021-12-07T10:16:18.2104083Z at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
2021-12-07T10:16:18.2106068Z at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366)
2021-12-07T10:16:18.2108087Z at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
2021-12-07T10:16:18.2110359Z at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
2021-12-07T10:16:18.2112126Z at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331)
2021-12-07T10:16:18.2113404Z at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79)
2021-12-07T10:16:18.2114898Z at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329)
2021-12-07T10:16:18.2116570Z at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66)
2021-12-07T10:16:18.2117971Z at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293)
2021-12-07T10:16:18.2119473Z at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306)
2021-12-07T10:16:18.2120838Z at org.junit.runners.ParentRunner.run(ParentRunner.java:413)
2021-12-07T10:16:18.2122697Z at org.apache.maven.surefire.junit4.JUnit4Provider.execute(JUnit4Provider.java:364)
2021-12-07T10:16:18.2125317Z at org.apache.maven.surefire.junit4.JUnit4Provider.executeWithRerun(JUnit4Provider.java:272)
2021-12-07T10:16:18.2128121Z at org.apache.maven.surefire.junit4.JUnit4Provider.executeTestSet(JUnit4Provider.java:237)
2021-12-07T10:16:18.2130624Z at org.apache.maven.surefire.junit4.JUnit4Provider.invoke(JUnit4Provider.java:158)
2021-12-07T10:16:18.2133411Z at org.apache.maven.surefire.booter.ForkedBooter.runSuitesInProcess(ForkedBooter.java:428)
2021-12-07T10:16:18.2135971Z at org.apache.maven.surefire.booter.ForkedBooter.execute(ForkedBooter.java:162)
2021-12-07T10:16:18.2138058Z at org.apache.maven.surefire.booter.ForkedBooter.run(ForkedBooter.java:562)
2021-12-07T10:16:18.2144801Z at org.apache.maven.surefire.booter.ForkedBooter.main(ForkedBooter.java:548)
2021-12-07T10:16:18.2148455Z [main] DEBUG org.apache.sling.testing.mock.osgi.MockBundleContext - Shutting down MockBundleContext, bundleContext=org.apache.sling.testing.mock.osgi.MockBundleContext@674ad470
{noformat}

Stefan Seifert (JIRA)

unread,
Dec 7, 2021, 6:59:41 AM12/7/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=21408#comment-21408 ]

Stefan Seifert commented on WTES-69:
------------------------------------

i think i found the root cause issue and fixed it in https://issues.apache.org/jira/browse/SLING-10973

the MockServiceRegistration class used an integer field to generate a unique service ID (incrementing long) for each new service registration. this id was also used as comparison/equals key in the hash set for storing the actual service registrations in the bundle context.

this ID was generated not in a thread-safe manner. this is normally no problem as all operations in osgi-mock are done synchronously (except sending OSGi events). but as we seen in the log above, the sling component org.apache.sling.resourceresolver.impl.ResourceAccessSecurityTracker seems to launch a separate thread (i suppose it's an executor service) to register additional sling mappings. directly after this, the StringInterpolationProvider was provided.

so it may happen that two service registrations (e.g. a service user mapping and the string interpolation provider) get the exactly same service ID, and so one of them is omitted when adding to the set of service registrations.

Stefan Seifert (JIRA)

unread,
Dec 7, 2021, 9:01:40 AM12/7/21
to wcm-i...@googlegroups.com

[ https://wcm-io.atlassian.net/browse/WTES-69?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Stefan Seifert resolved WTES-69.
--------------------------------
Fix versions: AEM Mocks 4.1.8
Resolution: Fixed

> AEM Mocks Build fail sporadically with "Unable to inject mandatory reference" error
> -----------------------------------------------------------------------------------
>
> Key: WTES-69
> URL: https://wcm-io.atlassian.net/browse/WTES-69
> Project: wcm.io Testing
> Issue Type: Bug
> Components: AEM Mocks
> Reporter: Stefan Seifert
> Assignee: Stefan Seifert
> Fix For: AEM Mocks 4.1.6, AEM Mocks 4.1.8
Reply all
Reply to author
Forward
0 new messages