Retreive all resources of type in JPAServer

370 views
Skip to first unread message

Caleb Steele-Lane

unread,
Aug 25, 2021, 5:28:39 PM8/25/21
to HAPI FHIR

Hi there,

I am trying to iterate through every resource of a type stored in a Hapi FHIR JPA Server, but it seems to be stopping long before getting through all the records. Is there any expected way to be able to do this, or a setting I am missing to allow my method to work?

The code I am using is as follows:

Bundle searchBundle = client.search()//
                .forResource(resourceType)//
                .returnBundle(Bundle.class)//
                .execute();
searchBundles.add(searchBundle);
while (searchBundle.getLink(IBaseBundle.LINK_NEXT) != null) {
    searchBundle = client.loadPage().next(searchBundle).execute();
    searchBundles.add(searchBundle);
    }


I have around 400,000 resources of the type, but when I count through all the searchBundles, there are only around 3000 objects.

properties that may be relevant are:
    default_page_size: 100
    max_page_size: 200
    retain_cached_searches_mins: 60
    reuse_cached_search_results_millis: 1

Any help would be greatly appreciated,
Caleb

Caleb Steele-Lane

unread,
Aug 25, 2021, 5:30:05 PM8/25/21
to HAPI FHIR
Full application.yaml

spring:
  datasource:
    url: 'jdbc:postgresql://[% db_host %]:[% db_port %]/clinlims?currentSchema=clinlims'
    #url: jdbc:h2:mem:test_mem
    username: clinlims
    password: [% db_password %]
    driverClassName: org.postgresql.Driver
    max-active: 15
  jpa:
    properties:
      hibernate.dialect: org.hibernate.dialect.PostgreSQLDialect
#      hibernate.search.model_mapping: ca.uhn.fhir.jpa.search.lucenesearchmappingfactory
#      hibernate.format_sql: false
      hibernate.show_sql: false
      hibernate.hbm2ddl.auto: update
#      hibernate.jdbc.batch_size: 20
#      hibernate.cache.use_query_cache: false
#      hibernate.cache.use_second_level_cache: false
#      hibernate.cache.use_structured_entries: false
#      hibernate.cache.use_minimal_puts: false
#      hibernate.search.default.directory_provider: filesystem
#      hibernate.search.default.indexbase: target/lucenefiles
#      hibernate.search.lucene_version: lucene_current

  batch:
    job:
      enabled: false
      
#server:
#  port: 8443
#  ssl:
#    client-auth: need
#    key-store: file:/run/secrets/keystore
#    key-store-password:
#    key-password:
#    trust-store: file:/run/secrets/truststore
#    trust-store-password:

hapi:
  fhir:
    ### enable to set the Server URL
    server_address: [% local_fhir_server_address %]
    ### This is the FHIR version. Choose between, DSTU2, DSTU3, R4 or R5
    fhir_version: R4
#    defer_indexing_for_codesystems_of_size: 101
    #implementationguides:
      #example from registry (packages.fhir.org)
      #swiss:
        #name: swiss.mednet.fhir
        #version: 0.8.0
      #example not from registry
      #ips_1_0_0:
        #url: https://build.fhir.org/ig/HL7/fhir-ips/package.tgz
        #name: hl7.fhir.uv.ips
        #version: 1.0.0

    #supported_resource_types:
    #  - Patient
    #  - Observation
#    allow_cascading_deletes: true
    allow_contains_searches: true
    allow_external_references: true
#    allow_multiple_delete: true
#    allow_override_default_search_params: true
    allow_placeholder_references: true
    auto_create_placeholder_reference_targets: true
#    default_encoding: JSON
    default_pretty_print: false
    default_page_size: 100
#    enable_index_missing_fields: false
#    enforce_referential_integrity_on_delete: false
#    enforce_referential_integrity_on_write: false
#    etag_support_enabled: true
#    expunge_enabled: true
#    daoconfig_client_id_strategy: null
    fhirpath_interceptor_enabled: false
#    filter_search_enabled: true
#    graphql_enabled: true
#    narrative_enabled: true
    #partitioning:
    #  allow_references_across_partitions: false
    #  partitioning_include_in_search_hashes: false
    #cors:
    #  allow_Credentials: true
      # Supports multiple, comma separated allowed origin entries
      # cors.allowed_origin=http://localhost:8080,https://localhost:8080,https://fhirtest.uhn.ca
    #  allowed_origin:
    #    - '*'

#    logger:
#      error_format: 'ERROR - ${requestVerb} ${requestUrl}'
#      format: >-
#        Path[${servletPath}] Source[${requestHeader.x-forwarded-for}]
#        Operation[${operationType} ${operationName} ${idOrResourceName}]
#        UA[${requestHeader.user-agent}] Params[${requestParameters}]
#        ResponseEncoding[${responseEncodingNoDefault}]
#      log_exceptions: true
#      name: fhirtest.access
#    max_binary_size: 104857600

    max_page_size: 200
    retain_cached_searches_mins: 60
    reuse_cached_search_results_millis: 1
    tester:

        home:
          name: OE adjacent FHIR Store
          server_address: '[% local_fhir_server_address %]'
          refuse_to_fetch_third_party_urls: false
          fhir_version: R4

#    validation:
#      requests_enabled: true
#      responses_enabled: true
#    binary_storage_enabled: true
#    bulk_export_enabled: true
    subscription:
      resthook_enabled: true
#      websocket_enabled: false
#      email:
#        from: so...@test.com
#        host: google.com
#        port:
#        username:
#        password:
#        auth:
#        startTlsEnable:
#        startTlsRequired:
#        quitWait:
#    lastn_enabled: true


#
#elasticsearch:
#  debug:
#    pretty_print_json_log: false
#    refresh_after_write: false
#  enabled: false
#  password: SomePassword
#  required_index_status: YELLOW
#  rest_url: 'http://localhost:9200'
#  schema_management_strategy: CREATE
#  username: SomeUsername

James Agnew

unread,
Aug 26, 2021, 9:52:41 AM8/26/21
to Caleb Steele-Lane, HAPI FHIR
Hmm, weird.

There are two settings in hapi-fhir-jpaserver-base that would control the maximum number of resources that can be returned by a search: SearchPreFetchThresholds and FetchSizeDefaultMaximum.

Looking at the code in the jpaserver-starter, the latter setting seems to be mapped to the property "default_page_size" which seems badly named. You could certainly try removing this setting from your yaml.

--
You received this message because you are subscribed to the Google Groups "HAPI FHIR" group.
To unsubscribe from this group and stop receiving emails from it, send an email to hapi-fhir+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/hapi-fhir/128798c2-c015-4b27-a2bf-8c7ad4b70036n%40googlegroups.com.

Caleb Steele-Lane

unread,
Aug 26, 2021, 9:37:37 PM8/26/21
to HAPI FHIR
Thanks, I will try this next.

I was able to get many more records by commenting out "reuse_cached_search_results_millis" (but still not all)

Is the search "next link" only preserved for a length of time relating to this option?

James Agnew

unread,
Aug 27, 2021, 5:45:23 AM8/27/21
to Caleb Steele-Lane, HAPI FHIR
No, it's the "Retain_cached_searches_mins" setting that controls how long after a search is performed that the paging links should still be honoured.

Cheers,
James

Andrew Guselnikov

unread,
Jul 11, 2022, 7:13:37 AM7/11/22
to HAPI FHIR
Hello, 

I stuck with similar problem

The case is the following:

1. There are a lot of resources (~10M) of one type on FHIR server. Let's say Claim.
2. I need to get them all to update one field because of erroneous previous upload
3. I wrote simple app that performs [base]/Claim REST call (to get resources page by page) and all calls to the next page 
=> 
On one happy moment processing stops - as a result I got only ~350K resources. 

I found that there is an option SearchPreFetchThresholds  that regulates if I'm not mistaken the number of items that HAPI tries to get as a prefetch

search_pre_fetch_thresholds=500,2000,10000,50000,100000,300000,600000,1000000,1500000,2000000,-1

I suspect that if there are a lot of items,  this mechanism tries to get some big number as a prefetch and is not able to do so

I was able to get more results with changing this option max_millis_to_wait_for_remote_results

But not all anyway. 

if I just write here something like 

search_pre_fetch_threshold=500, in order to prefetch to be constant  - I can see that I get only 500 items as result, but don't understand why...


Caleb, were you able to acquire all the resources

пятница, 27 августа 2021 г. в 12:45:23 UTC+3, james...@gmail.com:
Reply all
Reply to author
Forward
0 new messages