Agreed.
> There has been one patch submitted to enable this feature, but it is
> fairly poor (I know. I put it in. ) It has been mentioned in irc that
> using a crontab may be the way to go.
>
> My idea is to add a new post status to the poststatus table called
> 'future', and a new cronjob to the cron table to periodically update
> the posts table, changing all posts whose status is 'future' to a
> status of 'published' if the publication of the date is the post is
> less than the date that the cronjob is run. My first impression is
> that the cronjob should run once a minute, since people will expect
> their post to be published when they tell it to be published, but I
> don't know what kind of load this would entail on a busy site. A less
> frequent schedule may be more suitable.
Your idea sounds reasonable to me. I think running once an hour would
be sufficient though, as long as authors know that's the case.
> When a post is saved, we would check to see if it's status was set to
> 'published' by the user. If so, check the publication date. If it has
> been set to a date or time in the future, change the post status to
> 'future' internally before saving the post. Using these mechanisms,
> scheduled posts are enabled without changing the current interface
> that users see.
There has to be a change, because there'd have to be a date picker for
the publish date to be in the future. That could be hidden in the
splitter though.
> The cronjob and new status can be added to the database when Habari is
> installed. For people who are currently using Habari, a plugin can be
> developed to add the cronjob and new status to their database.
> Alternatively, the database version can be bumped and an upgrade
> triggered to add the new data, though I'm not sure of the suitability
> of that since this entails no change to the database format, only to
> the data in it.
All of this can and should be accomplished through a plugin. I don't
think it should be core.
--
Michael C. Harris, School of CS&IT, RMIT University
http://twofishcreative.com/michael/blog
Chris
Since I am one of the people who suggested the use of cron, I like
it! My only question is the same one you raise here, namely how often
should the job run.
Chris
> Since I am one of the people who suggested the use of cron, I like
> it! My only question is the same one you raise here, namely how often
> should the job run.
Our cron is pretty efficient, I don't see the problem with running the
publisher cron every 1-5 minutes.
-Matt
I think the job running period really depends on how accurate we would want to have the scheduled posting to be. I am guessing an hour would be more than sufficient for this, maybe 2 hours?
The idea is to have a crotab auto-publish the post is due. That is, saving it as draft and have habari publish it at a time. Rather than having the post status as "published" while it is actually not
WordPress used the model you described for quite some time, and as I
recall it introduced some pretty hefty performance penalties on big
sites as you include the MySQL NOW function in each query. That means
that the server needs to compute a new value for every query, which
means that queries (and their results) cannot be effectively cached.
I think that the use of cron allows us to better leverage our API. In
your model, Rich, a post's status changes from "scheduled" to
"published", but it's not entirely clear whether the entire stack of
actions triggered on such a status change would execute. If we use
cron, we can unambiguously execute Post::publish() on the item in
question, trigger ping notifications, pingbacks, and everything else:
all before the theme collects the list of published posts, to ensure
that the just-published post is now included in the current request.
Cheers,
Scott
WordPress used the model you described for quite some time, and as Irecall it introduced some pretty hefty performance penalties on bigsites as you include the MySQL NOW function in each query. That meansthat the server needs to compute a new value for every query, whichmeans that queries (and their results) cannot be effectively cached.
I think that the use of cron allows us to better leverage our API. Inyour model, Rich, a post's status changes from "scheduled" to"published", but it's not entirely clear whether the entire stack ofactions triggered on such a status change would execute. If we usecron, we can unambiguously execute Post::publish() on the item inquestion, trigger ping notifications, pingbacks, and everything else:all before the theme collects the list of published posts, to ensurethat the just-published post is now included in the current request.
That is pretty much exactly how our cron works. On every page view, if
the cron job hasn't run in this period, it runs and changes all matching
future posts to published.
-Matt
Here's what I think should happen, although I know that it currently
does not.
The primary reason you would set things up the following way is to avoid
executing non-cacheable queries - queries in which a date changes in the
query on every execution - on every page load. Not doing things as
described below (or by taking some similar steps to avoid non-caching
queries and synchronous cron execution) will cause significant, visible
performance degradation.
When you publish an entry that is scheduled for the future, its status
needs to be set to "scheduled". This simple switch allows for entries
in the future to remain out of the list of posts that are published to
the home page (unless you're running some crazy Post::get() call that
gets posts of any status, which would not be standard). I imagine that
the setting of the "scheduled" status could be transparent to the user -
they'd set a future date and status of "published", and upon save Habari
would transparently change the status to "scheduled" without user
interaction.
When a scheduled post is saved, a query is performed to find the next
scheduled post. It could be the one just entered, but it may be some
other post. The scheduled date of the post is noted.
A specifically named cronjob is then created or updated and set to
execute at the time of the next scheduled post. CronJobs can be set to
execute at specific times, if no visitor triggers the job, then it
executes whenever the next immediate visit occurs.
The job runs a function specific to updating scheduled posts whose times
have come to "published" status. At this time, if there are scheduled
posts remaining, the CronJob is updated to execute at the time of the
next scheduled post.
The advantage of this process is that there is only ever a single
CronJob running for all scheduled posts. This CronJob runs at exactly
the moment that the scheduled post is to be published (eliminating all
of the "how frequently do we do this?" questions).
To clarify for benefit of the bulk of the thread topic, the Habari
CronTab and CronJob classes do not execute like a unix cron. You can
set any job to run once at a specific time, or multiple times at an
interval, or even many times with an end time. It is much more flexible
than unix cron. The only drawback with it is that it runs only when the
site is visited, so tasks that should be automated even when there are
no visitors will not run on their own. However, this can be
accomplished by setting a unix cron that fetches the Habari cron URL
periodically.
Most significant of all, this process allows us to run a single,
cacheable query to return all CronJobs and loop through them, executing
only those whose execution times have passed. The CronTab currently
does not do this, and is causing an inefficiency. What should happen is
the CronTab should query for ALL CronJobs with a single "SELECT * FROM
{crontab}" and then loop through them in memory to detect which ones
need to execute, updating as appropriate. This will yield significant
performance gains.
One downside to this process is that because cron should execute
asynchronously, the request that triggers the cron will likely not
benefit from its execution. A user who triggers the cron on a publish
should not need to wait for the process behind the cron to execute
before displaying their request, so we execute crons asychronously.
What should happen is when a visitor requests the page, Habari will
detect whether cron jobs should execute, and if so, issue itself via
RemoteRequest another HTTP request to the cron URL. This allows
execution on the visitor's request to continue without impedance. The
cron URL requested on a separate thread will do the actual work.
The reason for this is that publishing a post can be a time-consuming
process since Habari may need to contact many external servers for
pings, etc. As a result, since the cron execution actually happens in a
separate/subsequent request, the user who triggered the cron won't see
the result of the cron unless they reload. I think that this is a fair
trade-off for smooth site running, and anyone uncomfortable with that
delay should set up a unix cron to periodically poll Habari's cron URL,
which will most likely obviate the issue.
If this is all still unclear, maybe I can work up some flowcharts later
on that depict the process more clearly.
Owen
Most significant of all, this process allows us to run a single,
cacheable query to return all CronJobs and loop through them, executing
only those whose execution times have passed. The CronTab currently
does not do this, and is causing an inefficiency. What should happen is
the CronTab should query for ALL CronJobs with a single "SELECT * FROM
{crontab}" and then loop through them in memory to detect which ones
need to execute, updating as appropriate. This will yield significant
performance gains.
I think future posting should be a core feature. It's a default
option in many other weblog applications, and it provides a very
useful feature.
Our invocation of this feature would provide a nice built-in
demonstration of how to use our cron system, too.
Cheers,
Scott
Graham Christensen
http://itrebal.com - Customized Web Hosting
Graham.Ch...@iamgraham.net
I think I hit all the big stuff.
--
--Robert Deaton
http://lushlab.com
A cursory overview of the patch looks good. A few comments / questions:
In my opinion, the list of scheduled posts in the dashboard should
only show up if there are, in fact, scheduled posts.
When publishing a future-dated item, we should probably indicate to
the user that the future-date has been preserved. Something along the
lines of "Your post has been scheduled for publication at ... '.
Have you tested this yourself, Rick? How has it worked for you?
Cheers,
Scott
I've tested this a little bit on localhost, and it seems to be
functional. Good work!
The only niggling complaint I have after minimal testing is that
future-dated posts show up as "published" to the author, with no
styling differentiation. One needs to look at the site without being
logged in to confirm that the future-dated post is not visible to
everyone.
This caused us a lot of grief with drafts, originally, as they, too,
were displayed on the front page to the logged-in user without styling
differentiation. We should find a way to visually disambiguate the
future-dated posts to reduce confusion.
We should find a way to visually disambiguate thefuture-dated posts to reduce confusion.
Ali B. replied:
> I think future-posted posts are considered drafts until they are due to be
> published and thus they can simply be displayed the way drafts are.
Incorrect. They're stored with a status of "scheduled" in the DB.
You may filter by this status in the /admin/content view.
When viewing the home page of the site, though, future posts are
displayed with a status of "published".
Ah-ha! I was looking at the wrong div when I checked this the first
time. Looking at it again, I see this:
<div id="post-9" class="scheduled">
So yes, I think our core themes should include a style for the
"scheduled" class.
Nice work, Rick!
A visual review (haven't applied the patch yet) reveals a little
weirdness in the Posts class. Are calls to DB::set_fetch_mode() and
DB::set_fetch_class() necessary in Posts::publish_scheduled_posts()?
Also, I'm not sure why the values sent to add_single_cron() in
Posts::update_scheduled_posts_cronjob() are first assigned to variables.
Shouldn't affect anything, just curious.
Have you tried scheduling two or more posts into the future? My read
through the code indicates that posts after the first scheduled one will
not publish (unless you publish something manually inbetween), but maybe
I've misread something? I expected to see a rescheduling of the cron
after the current scheduled post is published.
If you alter update_scheduled_posts_cronjob() slightly to account for
the case where no scheduled post exists, you could cleanly call
update_scheduled_posts_cronjob() without condition upon completion of
publish_scheduled_posts().
Owen
No, I simply missed that update_scheduled_posts_cronjob() is being
called from Post on every status change involving scheduled posts.
Owen
After the discussion in this thread, and some minor revisions, I have
committed this patch to the core. If you're following SVN HEAD for
your site, you'll need to manually add the "scheduled" post status to
your database:
INSERT INTO habari__postsstatus (name, internal) VALUES ('scheduled', 1);
Thanks a lot, Rick, for the excellent patch. This is a great addition
to Habari.
Cheers,
Scott
On Thu, Apr 17, 2008 at 6:28 PM, rick c <rickc...@gmail.com> wrote:Scheduled posts ( http://trac.habariproject.org/habari/ticket/36 ) area feature that has garnered a fair amount of interest, and as Habaribecomes more popular the demand for it will grow.
After the discussion in this thread, and some minor revisions, I have
committed this patch to the core. If you're following SVN HEAD for
your site, you'll need to manually add the "scheduled" post status to
your database:
INSERT INTO habari__postsstatus (name, internal) VALUES ('scheduled', 1);
Thanks a lot, Rick, for the excellent patch. This is a great addition
to Habari.