Need DBA help with Movable Type > WordPress migration

19 views
Skip to first unread message

David Veksler

unread,
Feb 28, 2010, 4:07:19 AM2/28/10
to Mises.org Development
I need some DBA help with our blog migration.

WordPress is not capable of handling our 76 MB MT export file, and it can't import the post_id's we need for redirects to work.

So we have to migrate via MySql script.

I need some help writing it.  This is what I have so far:

INSERT INTO wp_posts
(id, Post_title, post_name, Post_content, post_author,  post_date)
SELECT
 entry_id, entry_title, entry_basename, CONCAT(entry_text,entry_text_more) AS post_text, entry_author_id, entry_created_on FROM mises_blog..mt_entry
We need to migrate posts, authors, and comments, maintaining the primary keys.

David Veksler

unread,
Feb 28, 2010, 4:15:51 AM2/28/10
to Mises.org Development
If you wish to try the migration yourself, here is a zip export from MT:  http://blog.mises.com/mises_economics_blog.zip

Note that I have modified the Movable Type export script to add the ID: field.  

Also, we dont need to import any author information other than the name.

PEZ

unread,
Feb 28, 2010, 4:56:28 AM2/28/10
to Mises.org Development
When you say "Wordpress isn't capable", is it because of the size of
the data or something else?
/PEZ

David Veksler

unread,
Feb 28, 2010, 2:18:55 PM2/28/10
to mise...@googlegroups.com
Both.   It doesn't support importing the postID and it quits (with a success message) before importing all the posts.

Also I am convinced that PHP is the worst language/framework today. It is glacially slow and it considers (according to the documentation) 5MB is a "large" file.  I have a sever with RAID1, 16 cores and 6GB DDR3 and it's taking forever using 1% of CPU and 15MB of memory.

Peter Strömberg

unread,
Feb 28, 2010, 2:39:44 PM2/28/10
to mise...@googlegroups.com
Why must the postID be kept?

Have you tried editing php.ini to allow for > 5MB of memory to be
allocated by a script? Or if it is something with max upload size. Or
both. Does it still quit without importing anything?

/PEZ

Jeffrey Tucker

unread,
Feb 28, 2010, 2:44:41 PM2/28/10
to mise...@googlegroups.com
This is probably my doing - we need to maintain legacy links from the main site. If you find an article from 5 years ago, it's blog link takes you directly to the comment box. This is why this is important.


2010/2/28 Peter Strömberg <p...@pezius.com>

Peter Strömberg

unread,
Feb 28, 2010, 3:14:32 PM2/28/10
to mise...@googlegroups.com
I totally agree that old links should be maintained.

Since the export file has ID fields, does that mean that we're
following this recipe http://www.muditajournal.com/archives/000280.php
?

/PEZ

David Veksler

unread,
Feb 28, 2010, 3:23:27 PM2/28/10
to mise...@googlegroups.com
#Change upload limits
memory_limit 540M
post_max_size 330M
upload_max_filesize 320M
max_execution_time 60000
#Change upload limits end

It quits after 800 posts.  S
2010/2/28 Peter Strömberg <p...@pezius.com>

David Veksler

unread,
Feb 28, 2010, 3:24:38 PM2/28/10
to mise...@googlegroups.com
That recipe is for WordPress 2.0.

Wordpress import in 2.0 doesn't work. Wordpress 2.0 imports 1/3 of the posts.  The other 2/3 are missing - with no error message.



2010/2/28 Peter Strömberg <p...@pezius.com>

Peter Strömberg

unread,
Feb 28, 2010, 3:31:04 PM2/28/10
to mise...@googlegroups.com
That's hope in'it? =) How about timeout limits in the web server? Is it Apache?

/PEZ

David Veksler

unread,
Feb 28, 2010, 3:39:25 PM2/28/10
to mise...@googlegroups.com
It doesn't return any errors.  

All timeouts are off.

Peter Strömberg

unread,
Feb 28, 2010, 3:45:15 PM2/28/10
to mise...@googlegroups.com
Which version of Wordpress are you using? 2.9.2? I think I'll give
that import a try myself.

/PEZ

David Veksler

unread,
Feb 28, 2010, 3:50:45 PM2/28/10
to mise...@googlegroups.com
Yes, please do.

David Veksler

unread,
Feb 28, 2010, 5:23:53 PM2/28/10
to mise...@googlegroups.com
Is there really no one on this list who knows sql?  Where are all the open-source advocates?

Peter Strömberg

unread,
Feb 28, 2010, 5:29:45 PM2/28/10
to mise...@googlegroups.com
Looking at post 2219, it says:

Importing post Results of M�ndliche Pr�fung, Mises University, Session
Two, 2004...

Doesn't look right, what's the charset/encoding used in the MT install?

/PEZ

2010/2/28 Peter Strömberg <p...@pezius.com>:
> Importing now. Using stock 2.9.2 import so it probably doesn't honor
> the ID:s, but I should be able to patch the script for that. So far
> 2120 entries imported. I don't know how many there are to import and
> it's getting late here so I'll leave my MacBook on and see if it has
> finished when I wake up.

Peter Strömberg

unread,
Feb 28, 2010, 5:26:39 PM2/28/10
to mise...@googlegroups.com
Importing now. Using stock 2.9.2 import so it probably doesn't honor
the ID:s, but I should be able to patch the script for that. So far
2120 entries imported. I don't know how many there are to import and
it's getting late here so I'll leave my MacBook on and see if it has
finished when I wake up.

/PEZ

Peter Strömberg

unread,
Feb 28, 2010, 5:46:40 PM2/28/10
to mise...@googlegroups.com
Now the import crashed like so:

Importing post Unpromising start to Becker-Posner Blog...
Fatal error: Allowed memory size of 134217728 bytes exhausted (tried
to allocate 71 bytes) in .../wp-includes/plugin.php on line 302

Going to bed. But this should be fixable if we split the import file up some.

RPG

unread,
Feb 28, 2010, 7:26:37 PM2/28/10
to Mises.org Development
I saw the call for help on twitter so I'm stopping by to offer my
2cents. I agree with Peter's suggestion. If the import function
inserts one row at a time the process could take longer than the
timeout settings in apache and php.ini. Chopping the data into
chunks and loading each separately is worth a shot. If that doesn't
work I'd try the "LOAD DATA INFILE" method from the command line.

David Veksler

unread,
Feb 28, 2010, 7:41:13 PM2/28/10
to mise...@googlegroups.com
I tried that.  It's still randomly skipping posts even with 5MB chunks.

Also, Movable Type only exported half the posts.

Open source runtimes suck.

RPG

unread,
Feb 28, 2010, 8:08:47 PM2/28/10
to mise...@googlegroups.com
Do you have root access to the server? I ask because I think it's time
to start changing settings in php.ini to enable the export functions of
MT and the import functions of WP to have enough time to complete.


David Veksler wrote:
> I tried that. It's still randomly skipping posts even with 5MB chunks.
>
> Also, Movable Type only exported half the posts.
>
> Open source runtimes suck.
>
> ---
> Regards,
> David V.
>
> PGP Key: http://www.rationalmind.net/david/DavidLeoVeksler.txt
>
>
> On Sun, Feb 28, 2010 at 6:26 PM, RPG <ini...@gmail.com
> <mailto:ini...@gmail.com>> wrote:
>
> I saw the call for help on twitter so I'm stopping by to offer my
> 2cents. I agree with Peter's suggestion. If the import function
> inserts one row at a time the process could take longer than the
> timeout settings in apache and php.ini. Chopping the data into
> chunks and loading each separately is worth a shot. If that doesn't
> work I'd try the "LOAD DATA INFILE" method from the command line.
>
>

> On Feb 28, 3:31 pm, Peter Str�mberg <p...@pezius.com


> <mailto:p...@pezius.com>> wrote:
> > That's hope in'it? =) How about timeout limits in the web
> server? Is it Apache?
> >
> > /PEZ
> >
> > On Sun, Feb 28, 2010 at 9:23 PM, David Veksler <her...@gmail.com
> <mailto:her...@gmail.com>> wrote:
> > > #Change upload limits
> > > memory_limit 540M
> > > post_max_size 330M
> > > upload_max_filesize 320M
> > > max_execution_time 60000
> > > #Change upload limits end
> > > It quits after 800 posts. S
> > > ---
> > > Regards,
> > > David V.
> >
> > > PGP Key:http://www.rationalmind.net/david/DavidLeoVeksler.txt
> >

> > > 2010/2/28 Peter Str�mberg <p...@pezius.com
> <mailto:p...@pezius.com>>

magalhaes

unread,
Feb 28, 2010, 8:10:45 PM2/28/10
to Mises.org Development
Hi,

The link you gave for the export (http://blog.mises.com/
mises_economics_blog.zip ) is returning 404 Not Found.

If PHP/Apache is yielding time-outs maybe the best way is as RPG
suggests to use LOAD DATA INFILE (see http://dev.mysql.com/doc/refman/5.1/en/load-data.html).
If that fails, then the remedy is as you said to write a SQL script. I
have some experience with MySQL (with and without PHP) but not with
WordPress. Anyway, if the correspondence between tables/columns is
known, it shouldn't be too hard.

So I ask: what is the correspondence between tables/columns?

BTW - In the query which you present, the number of columns in *1, *2
and *3 must be the same, so your query will not work.

INSERT INTO wp_posts (*1)
SELECT *2 AS *3
FROM mises_blog..mt_entry

João


On Mar 1, 12:41 am, David Veksler <her...@gmail.com> wrote:
> I tried that.  It's still randomly skipping posts even with 5MB chunks.
>
> Also, Movable Type only exported half the posts.
>
> Open source runtimes suck.
>
> ---
> Regards,
> David V.
>
> PGP Key:http://www.rationalmind.net/david/DavidLeoVeksler.txt
>

David Veksler

unread,
Feb 28, 2010, 9:18:15 PM2/28/10
to mise...@googlegroups.com
Sorry, I put it back.

David Veksler

unread,
Feb 28, 2010, 9:27:55 PM2/28/10
to mise...@googlegroups.com
>  Do you have root access to the server?   I ask because I think it's time to start changing settings in php.ini to enable the export functions of MT and the import functions of WP to have enough time to complete.

Yes, there is no configuration value limiting the process.  PHP and Perl are just shitty runtimes not designed to handle real-life data volumes.

I attached the schemas for WordPress and Movable Type.

Here is a script written for Drupal -> WordPress which I found helpful: http://www.rufuspollock.org/2005/10/10/migrating-drupal-to-wordpress/

Note that we only need the data in the MT export file migrated.
WP_schema.sql
MT_schema.sql

RPG

unread,
Feb 28, 2010, 10:24:54 PM2/28/10
to mise...@googlegroups.com
Under the "Resource Limits" section of your php.ini config file there
should be a max_execution_time variable, it is this variable that I
would look at first. In a normal "production" environment it should be
set just long enough to complete all normal running php processes. It
is my guess that the maintenance functions that you are trying to
execute are taking much longer than the current setting. As a result of
this php kills the process because it is running too long. This is
done to avoid overloading the server because of bad script, etc. My
guess is that the production value of max_execution_time is set to "30"
(seconds), temporary bump that up to "900", restart apache and see if
your export/import functions work.

Greg Bacon

unread,
Feb 28, 2010, 10:59:17 PM2/28/10
to mise...@googlegroups.com
David Veksler <her...@gmail.com> wrote:

> Yes, there is no configuration value limiting the process.  PHP and Perl are
> just shitty runtimes not designed to handle real-life data volumes. [...]

Did you attempt to upload and import in the same operation over HTTP, or did you FTP the MT export to wp-content/mt-export.txt and tell WP to work on local data? The latter is cruising along happily on my little underpowered Vista laptop.

Jenette Fulda wrote last month about migrating MT to WP, including preservation of permalinks using MT Blog Import Utilities rather than datamodel hacks.

I get that you're frustrated, but spraying ignorant comments doesn't help the situation. I had hoped a similar snipe from another poster the other day was an anomaly.

Greg

alvarocoronel

unread,
Feb 28, 2010, 11:23:14 PM2/28/10
to Mises.org Development
In MT, the values of entry_allow_comments are 0 and 1. We need to know
what they map to in WP

Also, the values of entry_status are 1 and 2.

Maybe you can post a couple of entries and play with their status a
little bit while querying the database, then post the observed values.

Actually, the values of other fields could be useful as well. For
example, "post_type"

Best regards,
Álvaro.

DBra...@mises.com

unread,
Feb 28, 2010, 11:21:46 PM2/28/10
to Mises.org Development
You can speed up the import by dropping the table constraints. The
primary key requires the data to be in key order or it will cause a
resort on every insert. Any additional indexes are B+ trees requiring
re-balancing after every insert. It's better to let the database sort
the table one time and build each index one time.

Does MySQL allow you to add the AUTO_INCREMENT constraint with a
specified starting point after the data is loaded? (MSSQL does) That
would solve your ID problem.

Message has been deleted

alvarocoronel

unread,
Mar 1, 2010, 12:07:38 AM3/1/10
to Mises.org Development
This should take a few seconds. Authors, posts and comments.
Please note that I couldn't find some fields in the MT schema...

/* users from author */
insert into wp_users (
ID,
user_login,
user_pass,
user_nicename,
user_email,
user_url,
user_registered,
/* user_activation_key, */
user_status
/* display_name -> derive from either login or nicename*/

) (select
author_id,
author_name,
author_password, /* fingers crossed */
author_nickname,
author_email,
author_url,
author_created_on,
/* user_activation_key, */
author_status
from misesmt.mt_author);


/* post from entry */
insert into wp_posts (
ID,
post_author,
post_date,
post_content,
post_title,
post_excerpt,
post_status,
comment_status,
ping_status,
/* post_password */
/* --post_name */
/* --to_ping */
/* --pinged */
post_modified,
/* --post_modified_gmt, */
/* --post_content_filter, */
/* --post_parent, */
/* --guid, */
/* --menu_order, */
/* --post_type, */
/* --post_mime_type, */
comment_count

) (
select
entry_id,
entry_author_id,
entry_created_on,
CONCAT(entry_text,entry_text_more) ,
entry_title,
entry_excerpt,
trim(cast(entry_status as char)),
trim(cast(entry_allow_comments as char)),
trim(cast(entry_allow_pings as char)),
/* --post_password */
/* --post_name */
/* --to_ping */
/* --pinged */
entry_modified_on,
/* -- post_modified_gmt, */
/* --post_content_filter, */
/* --post_parent, */
/* --guid, */
/* --menu_order, */
/* --post_type, */
/* --post_mime_type, */
entry_comment_count
from misesmt.mt_entry);

insert into wp_comments (
comment_ID,
comment_post_ID,
comment_author,
comment_author_email,
comment_author_url,
comment_author_IP,
comment_date,
/* comment_date_gmt, */
comment_content,
/* comment_karma, junk_score? */
/* comment_approved, comment_junk_status?? */
/* comment_agent, */
/* comment_type, */
comment_parent,
user_id
) ( select
comment_id,
comment_entry_id,
comment_author,
comment_email,
comment_url,
'127.0.0.127',
comment_created_on,
/* comment_date_gmt, */
comment_text,
/* comment_karma, */
/* comment_approved, */
/* comment_agent, */
/* comment_type, */
comment_parent_id,
comment_created_by
from misesmt.mt_comment);

David Veksler

unread,
Mar 1, 2010, 12:08:29 AM3/1/10
to mise...@googlegroups.com
Don't worry about any of those values because they can be set globally after the fact.

Here is the first post from wp_posts for the defaults:

1 1 2010-02-27 06:45:03 2010-02-27 06:45:03 Welcome to WordPress. This is your first post. Edit or delete it, then start blogging! Hello world! publish open open hello-world 2010-02-27 06:45:03 2010-02-27 06:45:03 0 http://blog.mises.com/?p=1 0 post 0

Rick Morris

unread,
Mar 1, 2010, 1:09:51 AM3/1/10
to Mises.org Development
On Feb 28, 9:27 pm, David Veksler <her...@gmail.com> wrote:
> >  Do you have root access to the server?   I ask because I think it's time
>
> to start changing settings in php.ini to enable the export functions of MT
> and the import functions of WP to have enough time to complete.
>
> Yes, there is no configuration value limiting the process.  PHP and Perl are
> just shitty runtimes not designed to handle real-life data volumes.
>

In fairness the problem is not PHP nor even MySQL, as much as MySQL
frustrates me sometimes. It is just a fact of how the Wordpress import
scripts function. I am running the import right now on my FreeBSD
server and httpd has been churning at ~96% CPU usage for 20 minutes so
far, while MySQL is only at 2-3%. From what I see, this is because the
import forces every post to go through a validation routine called
apply-filters() ( called from wp-includes/comment.php, but the
function exists inside wp-includes/plugin.php). This is normally
invoked per-post as submitted by authors. It was not really intended
for high volume since it does a lot of array traversal and dynamic
function calls. Not being a Wordpress expert, I don't know if this can
be safely disabled for a high-volume import, but tomorrow I will ask a
friend who should know.

If this import manages to complete on my server, I will gladly provide
a MySQL dump for you.

David V

unread,
Mar 1, 2010, 1:19:46 AM3/1/10
to Mises.org Development
Alvaro wrote the script we needed.. Thanks!

j. andrew

unread,
Mar 1, 2010, 10:13:24 AM3/1/10
to Mises.org Development
On Feb 28, 2:18 pm, David Veksler <her...@gmail.com> wrote:
> Also I am convinced that PHP is the worst language/framework today.

indeed. but like the constitution, you won't find anything else much
better.

> #Change upload limits

there is no reason for you to run scripts to rebuild the database over
the web API. use the mysql and php command line interfaces.

Peter Strömberg

unread,
Mar 1, 2010, 10:38:02 AM3/1/10
to mise...@googlegroups.com
That's great.

Is the server you are importing on the planned production environment?
If so we should make sure it's configured well, because the strange
import behaviour you experienced yesterday (with interruption without
error messages and such) seems to indicate that there's something
wrong.

/PEZ

Andrew Rodland

unread,
Mar 2, 2010, 4:17:07 PM3/2/10
to Mises.org Development
On Feb 28, 8:27 pm, David Veksler <her...@gmail.com> wrote:
> Yes, there is no configuration value limiting the process.  PHP and Perl are
> just shitty runtimes not designed to handle real-life data volumes.

Here's a fun fact that that I've learned from a decade of doing gratis
support for open-source software:

When you're requesting that someone do work for you entirely for free,
it helps very little to alienate your audience with ignorant,
offensive comments, and it helps even less to imply that, when your
attitude turns people off from spending time to work out the specifics
of your problem, the lack of immediate response is due to
incompetence.

Andrew

Reply all
Reply to author
Forward
0 new messages