SS in hospital registry data /Electronic health record (EHR)

MANOJ KUMAR

unread,

Aug 5, 2024, 1:05:10 PM8/5/24

to meds...@googlegroups.com

Dear All,

Greetings !!

I am currently working on a study involving hospital registry data / electronic health records (EHR) and am seeking guidance on the necessity and process of sample size calculation for such data sets. Do we need sample size calculation ? and Could you please provide insights on this ?

Your expertise and guidance on this matter would be greatly appreciated. I am keen to ensure that our study is methodologically sound and that we have sufficient data to achieve reliable and valid results.

thanking you in advance

Manoj Kumar

University of Pittsburgh

Best Regards,

Dr. Manoj Kumar Diwakar, M.Sc., M.Phil.,Ph.D. (Statistics)
Assistant Professor
Centre for Economic Studies & Planning (CESP), School of Social Sciences (SSS-II),
Jawaharlal Nehru University, New Delhi-110067, India.

Email id: manojkumar@jnu.ac.in

Mobile-09990346151

Area of Specialisation: Statistics, Econometric and Applied Mathematics

Research Methodology -Quantitative Methods, Health Economics, Clinical Trial-Biostatistics

Data analysis and Software: SAS, SPSS, R, STATA, SPSS AMOS

MANOJ KUMAR

unread,

Aug 5, 2024, 2:28:35 PM8/5/24

to Marc Schwartz, meds...@googlegroups.com

Thanks Marc for your view on EHR data

I am specifically talking about the retrospective study design of EHR or hospital registry data and its associated sample size in manuscript writing.

On Mon, Aug 5, 2024 at 2:01 PM Marc Schwartz <marc_s...@me.com> wrote:

Hi,

My impression is that this will be a retrospective study acquiring your data from pre-existing electronic records. If that is not correct and this will be a prospective study, please correct me.

For a retrospective study, especially if the data are already available electronically, versus manually abstracting data from charts, there is no disincentive (e.g. time, cost, labor, etc.) to get as much data as you can.

The keys will be avoiding patient selection bias by obtaining all patients that meet your clinical inclusion/exclusion criteria, over a time period that makes sense clinically, if there are any concerns for changes in patients and/or in the practice of medicine during that time period that might confound your results, unless that is also part of your study intent. You may also need to consider any minimum follow up times that are relevant to your study goals.

If you have not already, you might begin with a query to the parties that hold the data to see how many patients conform to your inclusion/exclusion criteria for specific time periods (e.g. 1 year, 3 years, 5 years, 10 years) to get an initial sense of how many patients would be available. You should be able to obtain some indication as to whether to expect, for example, 100 patients, 500 patients, 1,000 patients or 5,000 patients, given some knowledge of the patient cohort of interest.

Based upon that information and the expected size of the cohort, you can, in essence, back fit 95% confidence intervals around various point estimates given some assumptions about the incidence of relevant characteristics for dichotomous variables. That way, based upon the expected sample size, you can provide a table in your protocol of some estimates of the precision around the variables that you might expect and then use that to support your sample size. For example, you might include 1%, 5%, 10%, 20%, 30% and 40%, with their 95% confidence intervals, to give a sense of the expected precision around each proportion.

This is what I have done in the past, even with prospective observational studies, where some justification for a sample size is desired, and in the situation where no formal null hypotheses are pre-specified such that formal power/sample size estimates are performed.

Regards,

Marc Schwartz

On Aug 5, 2024 at 10:05 AM -0700, MANOJ KUMAR <manoj...@jnu.ac.in>, wrote:

Dear All, Greetings !!
I am currently working on a study involving hospital registry data / electronic health records (EHR) and am seeking guidance on the necessity and process of sample size calculation for such data sets. Do we need sample size calculation ? and Could you please provide insights on this ?
Your expertise and guidance on this matter would be greatly appreciated. I am keen to ensure that our study is methodologically sound and that we have sufficient data to achieve reliable and valid results.

thanking you in advanceManoj Kumar University of Pittsburgh

Best Regards,Dr. Manoj Kumar Diwakar, M.Sc., M.Phil.,Ph.D. (Statistics)Assistant ProfessorCentre for Economic Studies & Planning (CESP), School of Social Sciences (SSS-II),Jawaharlal Nehru University, New Delhi-110067, India. Email id: manojkumar@jnu.ac.in Mobile-09990346151Area of Specialisation: Statistics, Econometric and Applied Mathematics Research Methodology -Quantitative Methods, Health Economics, Clinical Trial-BiostatisticsData analysis and Software: SAS, SPSS, R, STATA, SPSS AMOS

--

MANOJ KUMAR

unread,

Aug 5, 2024, 3:21:15 PM8/5/24

to Marc Schwartz, meds...@googlegroups.com

Thanks for the clarification.

On Mon, Aug 5, 2024 at 2:51 PM Marc Schwartz <marc_s...@me.com> wrote:

Hi Manoj,

With respect to the manuscript itself, whatever you get approved for the protocol, in terms of any justification for the sample size, can then be reflected in the methods section of the manuscript.

If there is no specific justification in the protocol, and it is just a matter of the sample size that you obtained based upon your inclusion/exclusion criteria and any related parameters, then describe that in the methods section of the manuscript.

Regards,

Marc

--
--
To post a new thread to MedStats, send email to MedS...@googlegroups.com .
MedStats' home page is http://groups.google.com/group/MedStats .
Rules: http://groups.google.com/group/MedStats/web/medstats-rules

---
You received this message because you are subscribed to the Google Groups "MedStats" group.
To unsubscribe from this group and stop receiving emails from it, send an email to medstats+u...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/medstats/CA%2ByqjKvAxmM2_aGQYb%3DrKrVSJLzeKR8cwreR-fQq1ZiRFC0HWw%40mail.gmail.com.

alejandro munoz

unread,

Aug 6, 2024, 12:47:04 PM8/6/24

to meds...@googlegroups.com, Marc Schwartz

Manoj,

Probably goes without saying, but I just wanted to make it explicit: the methodological soundness and the validity and reliability of your results will depend on much more than just your sample size. I suggest you consult STROBE or whichever EQUATOR guideline most closely matches your study type:

https://www.equator-network.org/reporting-guidelines/strobe/

In my experience, you'll use all available data subject to when data collection started in earnest, when the EHR was adopted and the data "stabilized", or when a given documentation or clinical practice was widely adopted.. It could be the creation date of your registry. As Marc suggested, it's still a good idea to compute power, if you have an idea about the effect estimates you expect to see.

Best,

Alejandro

To view this discussion on the web, visit https://groups.google.com/d/msgid/medstats/CA%2ByqjKtyv_A70aDVcvNkQwVCqgXrYMqQU7wDtfQW3DnbKaezKQ%40mail.gmail.com.

MANOJ KUMAR

unread,

Aug 6, 2024, 5:01:18 PM8/6/24

to meds...@googlegroups.com, Marc Schwartz

Dear Dr. Alegendro,

Thanks for your input but STROBE would suggest for the cohort studies/ case-control studies/ cross-sectional studies not for EHR /longitudinal studies.

for example, I am working on 50K patients with wounds from a period of 10 years of registry/EHR. Here I think no need of the sample size as taking the data whatever comes in the period including my inclusion and exclusion criteria.

Thanks,

To view this discussion on the web, visit https://groups.google.com/d/msgid/medstats/CAFMY105KB_hhA7ivfZhnHFRtL02SPn%3D%3DYQ3mULFT2LSjKZKpTg%40mail.gmail.com.

Reply all

Reply to author

Forward