A design for large scale URL or html to image system

Skip to first unread message


Sep 1, 2007, 11:49:21 PM9/1/07
to html to image
Htmlsnapshot is a component to convert html or url to image. It can be
used to build large scale URL to image conversion system. Here, the
large means the system is designed to convert millions of URLs to
image in the future.

My recommendation for making a robust system is to use process based
method. There are a main process in the system. It maintains the list
of URLs and launch child worker process to do the actual html to image
conversion. Such worker process creates only one Snapshot object. It
processes some URLs (may be 1 or 10, can be predefined) and exit.

The benefit:
1. Each snapshot object will not affect each other. Htmlsnapshot
itself is thread safe in general. The underline webbrowser control is
complex and might have concurrency issue in rare extreme condition. So
this can avoid the unknown issue around threading.

2. By wrapping snapshot function into a worker process, your main
process will always have control to start/stop/terminate a worker
process. Here your main process is a monitor and job dispatcher, it
will be most robust. For example, it has the choice to stop the worker
process when it is hung unexpectedly like downloading dialog pop up
Some may worry the performance comparing to the in-proc method. I
would say for large scale conversion jobs, the robustness is more
important. And the time to launch a process is a lot faster than
downloading from internet.

Reply all
Reply to author
0 new messages