Tesseract for OCR of CT dose summary images

315 views
Skip to first unread message

David Platten

unread,
Dec 22, 2016, 11:47:33 AM12/22/16
to OpenREM
I'm trying to use Tesseract (http://github.com/tesseract-ocr/tesseract) to carry out optical character recognition on Toshiba CT scanner dose summary images. The DICOM tags of these summaries don't contain useful dose information, so I'm resorting to OCR.

I'm having trouble getting Tesseract to work properly. I am currently converting the bitmap image contained in the DICOM file to a png file, enlarging it by a factor of 15 using ImageMagick’s convert command, and then running Tesseract on it. The numbers are correctly identified, but the associated text has lots of errors.

 

Is anyone able to offer any advice or help with this? I can supply an example dose summary from a QA scan.


Many thanks, and Happy Christmas,


David


Ed McDonagh

unread,
Dec 22, 2016, 3:31:55 PM12/22/16
to OpenREM
Hi David

My first port of call for this would be David Clunie's DoseUtility (http://www.dclunie.com/pixelmed/software/webstart/DoseUtilityUsage.html). I have just it previously to great effect with Siemens and GE 'dose screen' images (still do with GE). I've not tried it with Toshiba, though the instructions suggest it should work. Have you tried it?

I did try it on the images you sent me, and for some reason I couldn't get it to recognise that there was dose data there. If you also can't get it to work, then it would be worth asking David if there is any reason for this, and offer to send him some examples. The way to do that is to email the Yahoo group I think: https://groups.yahoo.com/neo/groups/pixelmed_dicom/info. If you do use DoseUtility, then it can make good use of the whole study if you let it to create a much richer RDSR, which it can then send on to conquest to import into OpenREM.

I've also just come across this that might be worth trying: http://www-hsc.usc.edu/~phillimc/doseocr/index.html; however it requires dotNet, and when I tried it in my windows 10 system I couldn't 'drop' any files onto it.

If you do find that doing your own OCR is the best way, when we have done this in the past we've resorted to creating a new 'font' and training it for the dose screen images we were using. We did this for example with the Siemens Zee until we got the RDSRs working.

I hope this have given you a few avenues to try!

Take care, and Happy Christmas.

Ed

David Platten

unread,
Dec 22, 2016, 3:54:21 PM12/22/16
to OpenREM
Hi Ed,

I hadn't looked at DoseUtility for this. I've just tried it on one of the Toshiba files and it seems to do a great job. I'll look at it some more tomorrow. It would be good if I can make it an automatic process.

Thanks for the suggestion,

David

Ed McDonagh

unread,
Dec 22, 2016, 4:20:27 PM12/22/16
to OpenREM
Yet another of my 'wouldn't it be nice if' ponderings for a long time has been an option to incorporate DoseUtility into OpenREM. 

I currently have a script that I run manually every now and then to query one of our GE scanners to pull each study one by one, make a DICOMDIR file and then generate an RDSR using DoseUtility, then I pass each of these into OpenREM. The whole process is a bit convoluted! 

Ideally I'd just have an extra option on the QR interface in OpenREM - use DoseUtility if no RDSR found, and whether to get all data or just the dose screen. Then a local_settings configuration for where the DoseUtility java file is and some extra documentation for installing the Java JRE or whatever it is that is needed. 

However, for your immediate needs I don't know how much the GUI interface can be automated. I know it can QR remote systems, and then DICOM Store the resulting RDSRs onward, but all of that requires clicking in the interface. But I haven't looked into it either! 

Ed


--
You received this message because you are subscribed to the Google Groups "OpenREM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openrem+unsubscribe@googlegroups.com.
To post to this group, send email to ope...@googlegroups.com.
Visit this group at https://groups.google.com/group/openrem.
For more options, visit https://groups.google.com/d/optout.

David Platten

unread,
Dec 23, 2016, 5:51:47 AM12/23/16
to OpenREM
Here's an update:

DoseUtility
This can produce an RDSR from the Toshiba dose info images, but the resulting DICOM object fails DoseUtility's in-built validation, giving around 15 errors saying that certain required things are missing:


Found XRayRadiationDoseSR IOD
Found Root Template TID_10011 (CTRadiationDose)
Error: Template 10011 CTRadiationDose/[Row 1] CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/[Row 6] DATETIME (113810,DCM,"End of X-Ray Irradiation"): within 1: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 3] CODE (123014,DCM,"Target Region"): 1.11.1: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CODE (123014,DCM,"Target Region"): Code (,,"") not found in context group 4030
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 8] NUM (113824,DCM,"Exposure Time"): within 1.11.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10014 ScanningLength/[Row 1] NUM (113825,DCM,"Scanning Length"): within 1.11.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 10] NUM (113826,DCM,"Nominal Single Collimation Width"): within 1.11.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 11] NUM (113827,DCM,"Nominal Total Collimation Width"): within 1.11.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 13] NUM (113823,DCM,"Number of X-Ray Sources"): within 1.11.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 14] CONTAINER (113831,DCM,"CT X-Ray Source Parameters"): within 1.11.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 3] CODE (123014,DCM,"Target Region"): 1.12.1: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CODE (123014,DCM,"Target Region"): Code (,,"") not found in context group 4030
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 8] NUM (113824,DCM,"Exposure Time"): within 1.12.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 10] NUM (113826,DCM,"Nominal Single Collimation Width"): within 1.12.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 11] NUM (113827,DCM,"Nominal Total Collimation Width"): within 1.12.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 12] NUM (113828,DCM,"Pitch Factor"): within 1.12.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing conditional content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 13] NUM (113823,DCM,"Number of X-Ray Sources"): within 1.12.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Error: Template 10013 CTIrradiationEventData/[Row 1] CONTAINER (113819,DCM,"CT Acquisition")/[Row 7] CONTAINER (113822,DCM,"CT Acquisition Parameters")/[Row 14] CONTAINER (113831,DCM,"CT X-Ray Source Parameters"): within 1.12.4: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113819,DCM,"CT Acquisition")/CONTAINER (113822,DCM,"CT Acquisition Parameters"): Missing required content item
Root Template Validation Complete
Warning: 1.10.2.1: /CONTAINER (113701,DCM,"X-Ray Radiation Dose Report")/CONTAINER (113811,DCM,"CT Accumulated Dose Data")/NUM (113813,DCM,"CT Dose Length Product Total")/CODE (113835,DCM,"CTDIw Phantom Type"): Content Item not in template
IOD validation complete


The RDSR is rejected when trying to import it into OpenREM, "AttributeError: Dataset does not have attribute 'ContentSequence'.", which I suspect is due to the things that DoseUtility says are missing.

Due to this missing information I don't think that this is the way to go. I'm also doubtful of whether a process involving DoseUtility can be fully automated.


Tesseract
I e-mailed Dr Tim O'Connell yesterday, after finding this RSNA poster on-line: https://www.rsna.org/uploadedFiles/RSNA/Content/Science_and_Education/Quality/3062-OConnell.pdf. Dr O'Connell has done exactly what I am trying to do. I've had a reply from him this morning explaining his methodology, and most importantly he attached his Tesseract training file for the type face used in the Toshiba dose images. Using this the OCR works *perfectly* so far.


ct-dose-ocr
I've also looked at ct-dose-ocr from http://www-hsc.usc.edu/~phillimc/doseocr/index.html. This can be run from the command line, and also recognises the text in the Toshiba dose images *perfectly* so far. This programme also retains the layout of the text, which I think might make it easier to use than the output from Tesseract.
Reply all
Reply to author
Forward
0 new messages