Retreive all resources of type in JPAServer

Caleb Steele-Lane

unread,

Aug 25, 2021, 5:28:39 PM8/25/21

to HAPI FHIR

Hi there,

I am trying to iterate through every resource of a type stored in a Hapi FHIR JPA Server, but it seems to be stopping long before getting through all the records. Is there any expected way to be able to do this, or a setting I am missing to allow my method to work?

The code I am using is as follows:

Bundle searchBundle = client.search()//
        .forResource(resourceType)//
                .returnBundle(Bundle.class)//
                .execute();

searchBundles.add(searchBundle);
while (searchBundle.getLink(IBaseBundle.LINK_NEXT) != null) {
    searchBundle = client.loadPage().next(searchBundle).execute();
    searchBundles.add(searchBundle);
    }

I have around 400,000 resources of the type, but when I count through all the searchBundles, there are only around 3000 objects.

properties that may be relevant are:

default_page_size: 100

max_page_size: 200

retain_cached_searches_mins: 60
reuse_cached_search_results_millis: 1

Any help would be greatly appreciated,

Caleb

Caleb Steele-Lane

unread,

Aug 25, 2021, 5:30:05 PM8/25/21

to HAPI FHIR

Full application.yaml

spring:
datasource:
    url: 'jdbc:postgresql://[% db_host %]:[% db_port %]/clinlims?currentSchema=clinlims'
    #url: jdbc:h2:mem:test_mem
    username: clinlims
    password: [% db_password %]
    driverClassName: org.postgresql.Driver
    max-active: 15
jpa:
    properties:
      hibernate.dialect: org.hibernate.dialect.PostgreSQLDialect
#      hibernate.search.model_mapping: ca.uhn.fhir.jpa.search.lucenesearchmappingfactory
#      hibernate.format_sql: false
      hibernate.show_sql: false
      hibernate.hbm2ddl.auto: update
#      hibernate.jdbc.batch_size: 20
#      hibernate.cache.use_query_cache: false
#      hibernate.cache.use_second_level_cache: false
#      hibernate.cache.use_structured_entries: false
#      hibernate.cache.use_minimal_puts: false
#      hibernate.search.default.directory_provider: filesystem
#      hibernate.search.default.indexbase: target/lucenefiles
#      hibernate.search.lucene_version: lucene_current

batch:
    job:
      enabled: false

#server:
# port: 8443
# ssl:
#    client-auth: need
#    key-store: file:/run/secrets/keystore
#    key-store-password:
#    key-password:
#    trust-store: file:/run/secrets/truststore
#    trust-store-password:

hapi:
fhir:
    ### enable to set the Server URL
    server_address: [% local_fhir_server_address %]
    ### This is the FHIR version. Choose between, DSTU2, DSTU3, R4 or R5
    fhir_version: R4
#    defer_indexing_for_codesystems_of_size: 101
    #implementationguides:
      #example from registry (packages.fhir.org)
      #swiss:
        #name: swiss.mednet.fhir
        #version: 0.8.0
      #example not from registry
      #ips_1_0_0:
        #url: https://build.fhir.org/ig/HL7/fhir-ips/package.tgz
        #name: hl7.fhir.uv.ips
        #version: 1.0.0

    #supported_resource_types:
    # - Patient
    # - Observation
#    allow_cascading_deletes: true
    allow_contains_searches: true
    allow_external_references: true
#    allow_multiple_delete: true
#    allow_override_default_search_params: true
    allow_placeholder_references: true
    auto_create_placeholder_reference_targets: true
#    default_encoding: JSON
    default_pretty_print: false
    default_page_size: 100
#    enable_index_missing_fields: false
#    enforce_referential_integrity_on_delete: false
#    enforce_referential_integrity_on_write: false
#    etag_support_enabled: true
#    expunge_enabled: true
#    daoconfig_client_id_strategy: null
    fhirpath_interceptor_enabled: false
#    filter_search_enabled: true
#    graphql_enabled: true
#    narrative_enabled: true
    #partitioning:
    # allow_references_across_partitions: false
    # partitioning_include_in_search_hashes: false
    #cors:
    # allow_Credentials: true
      # Supports multiple, comma separated allowed origin entries
      # cors.allowed_origin=http://localhost:8080,https://localhost:8080,https://fhirtest.uhn.ca
    # allowed_origin:
    #    - '*'

#    logger:
#      error_format: 'ERROR - ${requestVerb} ${requestUrl}'
#      format: >-
#        Path[${servletPath}] Source[${requestHeader.x-forwarded-for}]
#        Operation[${operationType} ${operationName} ${idOrResourceName}]
#        UA[${requestHeader.user-agent}] Params[${requestParameters}]
#        ResponseEncoding[${responseEncodingNoDefault}]
#      log_exceptions: true
#      name: fhirtest.access
#    max_binary_size: 104857600

    max_page_size: 200
    retain_cached_searches_mins: 60
    reuse_cached_search_results_millis: 1

    tester:

        home:
          name: OE adjacent FHIR Store
          server_address: '[% local_fhir_server_address %]'
          refuse_to_fetch_third_party_urls: false
          fhir_version: R4

#    validation:
#      requests_enabled: true
#      responses_enabled: true
#    binary_storage_enabled: true
#    bulk_export_enabled: true
    subscription:
      resthook_enabled: true
#      websocket_enabled: false
#      email:
#        from: so...@test.com
#        host: google.com
#        port:
#        username:
#        password:
#        auth:
#        startTlsEnable:
#        startTlsRequired:
#        quitWait:
#    lastn_enabled: true

#
#elasticsearch:
# debug:
#    pretty_print_json_log: false
#    refresh_after_write: false
# enabled: false
# password: SomePassword
# required_index_status: YELLOW
# rest_url: 'http://localhost:9200'
# schema_management_strategy: CREATE
# username: SomeUsername

James Agnew

unread,

Aug 26, 2021, 9:52:41 AM8/26/21

to Caleb Steele-Lane, HAPI FHIR

Hmm, weird.

There are two settings in hapi-fhir-jpaserver-base that would control the maximum number of resources that can be returned by a search: SearchPreFetchThresholds and FetchSizeDefaultMaximum.

Looking at the code in the jpaserver-starter, the latter setting seems to be mapped to the property "default_page_size" which seems badly named. You could certainly try removing this setting from your yaml.

--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/128798c2-c015-4b27-a2bf-8c7ad4b70036n%40googlegroups.com.

Caleb Steele-Lane

unread,

Aug 26, 2021, 9:37:37 PM8/26/21

to HAPI FHIR

Thanks, I will try this next.

I was able to get many more records by commenting out "reuse_cached_search_results_millis" (but still not all)

Is the search "next link" only preserved for a length of time relating to this option?

James Agnew

unread,

Aug 27, 2021, 5:45:23 AM8/27/21

to Caleb Steele-Lane, HAPI FHIR

No, it's the "Retain_cached_searches_mins" setting that controls how long after a search is performed that the paging links should still be honoured.

Cheers,

James

To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/8d0b8e54-b330-4c34-b84d-955ca025e6a7n%40googlegroups.com.

Andrew Guselnikov

unread,

Jul 11, 2022, 7:13:37 AM7/11/22

to HAPI FHIR

Hello,

I stuck with similar problem

The case is the following:

1. There are a lot of resources (~10M) of one type on FHIR server. Let's say Claim.

2. I need to get them all to update one field because of erroneous previous upload

3. I wrote simple app that performs [base]/Claim REST call (to get resources page by page) and all calls to the next page

=>

On one happy moment processing stops - as a result I got only ~350K resources.

I found that there is an option SearchPreFetchThresholds that regulates if I'm not mistaken the number of items that HAPI tries to get as a prefetch

search_pre_fetch_thresholds=500,2000,10000,50000,100000,300000,600000,1000000,1500000,2000000,-1

I suspect that if there are a lot of items, this mechanism tries to get some big number as a prefetch and is not able to do so

I was able to get more results with changing this option max_millis_to_wait_for_remote_results

But not all anyway.

if I just write here something like

search_pre_fetch_threshold=500, in order to prefetch to be constant - I can see that I get only 500 items as result, but don't understand why...

Caleb, were you able to acquire all the resources

пятница, 27 августа 2021 г. в 12:45:23 UTC+3, james...@gmail.com:

Reply all

Reply to author

Forward