Traceback (most recent call last): File "cpconvert.py", line 848, in <module> main() File "cpconvert.py", line 821, in main version,stories,images = importStories(verbose) File "cpconvert.py", line 679, in importStories for line in storiesCSV: _csv.Error: line contains NULL byte
pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
Hey Matt, I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
Daniel
On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
> Traceback (most recent call last): > File "cpconvert.py", line 848, in <module> > main() > File "cpconvert.py", line 821, in main > version,stories,images = importStories(verbose) > File "cpconvert.py", line 679, in importStories > for line in storiesCSV: > _csv.Error: line contains NULL byte
> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
The error I get is:
Traceback (most recent call last): File "cpconvert.py", line 848, in <module> main() File "cpconvert.py", line 821, in main version,stories,images = importStories(verbose) File "cpconvert.py", line 722, in importStories story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] IndexError: list index out of range
and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
$%?m(?O` "<p>
Q A " ? N<p>
qra?\ i O[?HD??=?A~r -/ ?="#?+$?+%<p>
Any ideas?
On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
> Hey Matt, > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
> Daniel
> On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
>> I'm getting this error:
>> Traceback (most recent call last): >> File "cpconvert.py", line 848, in <module> >> main() >> File "cpconvert.py", line 821, in main >> version,stories,images = importStories(verbose) >> File "cpconvert.py", line 679, in importStories >> for line in storiesCSV: >> _csv.Error: line contains NULL byte
>> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "$%?m(?O` "<p>" is the title, "Q" is the author, etc. and then the line ends before the script expects it to, causing the error. I'd find that line in your CSV and remove or replace it. Ideally, look closely and try to figure out why the CSV parser would choke on it (are there lots of commas and quotations? That's a recipe for issues, though the script should automatically escape a lot of them, I believe) and then do a general find-and-replace throughout the whole file.
On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: > Fixed the null byte issue, but now I'm getting "list index out of range". > Most of the data is normalized, but a few of the stories (the ones > containing NULL bytes) have really weird characters in them, and there are > newlines in the story text that might be throwing it off.
> I checked the encoding of the CSV file and it's telling me: Non-ISO > extended ASCII HTML document text, with very long lines, with CRLF line > terminators.
> The error I get is:
> Traceback (most recent call last): > File "cpconvert.py", line 848, in <module> > main() > File "cpconvert.py", line 821, in main > version,stories,images = importStories(verbose) > File "cpconvert.py", line 722, in importStories > story = > [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] > IndexError: list index out of range
> and here's what the last few lines of text in my CSV before the error (in > verbose mode, answering "yes" to "would you like to run a test") looks like:
> $%?m(?O` "<p>
> Q > A > " > ? > N<p>
> qra?\ > i > O[?HD??=?A~r > -/ ?="#?+$?+%<p>
> Any ideas?
> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
> > Hey Matt, > > I'm not positive, going off memory, but I think you need to find and > replace in the file on the null byte. You can replace it with a space or > nothing
> > Daniel
> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
> >> I'm getting this error:
> >> Traceback (most recent call last): > >> File "cpconvert.py", line 848, in <module> > >> main() > >> File "cpconvert.py", line 821, in main > >> version,stories,images = importStories(verbose) > >> File "cpconvert.py", line 679, in importStories > >> for line in storiesCSV: > >> _csv.Error: line contains NULL byte
> >> pretty straightforward, but I don't know a whole lot about Python. Can > anybody help?
Manual search and replace would be fine if my computer could handle a million-line text file, but it's having a little bit of trouble with that. Any suggestions?
> I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "
> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: > Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
> I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
> The error I get is:
> Traceback (most recent call last): > File "cpconvert.py", line 848, in <module> > main() > File "cpconvert.py", line 821, in main > version,stories,images = importStories(verbose) > File "cpconvert.py", line 722, in importStories > story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] > IndexError: list index out of range
> and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
> $%?m(?O` "<p>
> Q > A > " > ? > N<p>
> qra?\ > i > O[?HD??=?A~r > -/ ?="#?+$?+%<p>
> Any ideas?
> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
> > Hey Matt, > > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
> > Daniel
> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
> >> I'm getting this error:
> >> Traceback (most recent call last): > >> File "cpconvert.py", line 848, in <module> > >> main() > >> File "cpconvert.py", line 821, in main > >> version,stories,images = importStories(verbose) > >> File "cpconvert.py", line 679, in importStories > >> for line in storiesCSV: > >> _csv.Error: line contains NULL byte
> >> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
On Mon, Mar 7, 2011 at 3:21 PM, Matthew Gerring <beatpa...@gmail.com> wrote: > I figured it out- it's mangled text resulting from pasting out of a word > document into the College Publisher editor.
> Behold:
> OO 5 0 @ G à *Times New Roman5 à éSymbol3 à Arial3 à Times*"" 1 Ü - h jf > jf E † # ! * ® ùùé 0K ` By Daniel Lopez JMC JMC <p>
> ÿôÅU Oh 'â +' Y0X Ö à " - ÿ I _<p>
> <p>
> , 8 @ H P ' By Daniel Lopez rd y D JMC MC *Normale JMC 1C Microsoft Word > 8.0*d@@& aÖj™ @& aÖj™ E † <p>
> 'O'£. ç¢ +, íD 'O'£. ç¢ +, í< _h p | • Ü å £ " ù<p>
> Manual search and replace would be fine if my computer could handle a > million-line text file, but it's having a little bit of trouble with that. > Any suggestions?
> On Mar 7, 2011, at 3:15 PM, Miles Skorpen wrote:
> I haven't dealt with this in well over a year, but that a story is screwing > up the CSV parsing -- it thinks that "
> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com>wrote:
>> Fixed the null byte issue, but now I'm getting "list index out of range". >> Most of the data is normalized, but a few of the stories (the ones >> containing NULL bytes) have really weird characters in them, and there are >> newlines in the story text that might be throwing it off.
>> I checked the encoding of the CSV file and it's telling me: Non-ISO >> extended ASCII HTML document text, with very long lines, with CRLF line >> terminators.
>> The error I get is:
>> Traceback (most recent call last): >> File "cpconvert.py", line 848, in <module> >> main() >> File "cpconvert.py", line 821, in main >> version,stories,images = importStories(verbose) >> File "cpconvert.py", line 722, in importStories >> story = >> [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] >> IndexError: list index out of range
>> and here's what the last few lines of text in my CSV before the error (in >> verbose mode, answering "yes" to "would you like to run a test") looks like:
>> $%?m(?O` "<p>
>> Q >> A >> " >> ? >> N<p>
>> qra?\ >> i >> O[?HD??=?A~r >> -/ ?="#?+$?+%<p>
>> Any ideas?
>> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
>> > Hey Matt, >> > I'm not positive, going off memory, but I think you need to find and >> replace in the file on the null byte. You can replace it with a space or >> nothing
>> > Daniel
>> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> >> wrote:
>> >> I'm getting this error:
>> >> Traceback (most recent call last): >> >> File "cpconvert.py", line 848, in <module> >> >> main() >> >> File "cpconvert.py", line 821, in main >> >> version,stories,images = importStories(verbose) >> >> File "cpconvert.py", line 679, in importStories >> >> for line in storiesCSV: >> >> _csv.Error: line contains NULL byte
>> >> pretty straightforward, but I don't know a whole lot about Python. Can >> anybody help?
> Manual search and replace would be fine if my computer could handle a million-line text file, but it's having a little bit of trouble with that. Any suggestions?
> On Mar 7, 2011, at 3:15 PM, Miles Skorpen wrote:
>> I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "
>> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: >> Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
>> I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
>> The error I get is:
>> Traceback (most recent call last): >> File "cpconvert.py", line 848, in <module> >> main() >> File "cpconvert.py", line 821, in main >> version,stories,images = importStories(verbose) >> File "cpconvert.py", line 722, in importStories >> story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] >> IndexError: list index out of range
>> and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
>> $%?m(?O` "<p>
>> Q >> A >> " >> ? >> N<p>
>> qra?\ >> i >> O[?HD??=?A~r >> -/ ?="#?+$?+%<p>
>> Any ideas?
>> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
>> > Hey Matt, >> > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
>> > Daniel
>> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
>> >> I'm getting this error:
>> >> Traceback (most recent call last): >> >> File "cpconvert.py", line 848, in <module> >> >> main() >> >> File "cpconvert.py", line 821, in main >> >> version,stories,images = importStories(verbose) >> >> File "cpconvert.py", line 679, in importStories >> >> for line in storiesCSV: >> >> _csv.Error: line contains NULL byte
>> >> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
Also, if you can, you may want to ignore the stories with junk data anyway and do a manual reimport. I don't think it's worth corrupting your archives, and causing troubles for the next migration, etc. with junk data
On Mar 8, 2011, at 7:15 AM, Miles Skorpen <mi...@milesskorpen.com> wrote:
> I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "$%?m(?O` "<p>" is the title, "Q" is the author, etc. and then the line ends before the script expects it to, causing the error. I'd find that line in your CSV and remove or replace it. Ideally, look closely and try to figure out why the CSV parser would choke on it (are there lots of commas and quotations? That's a recipe for issues, though the script should automatically escape a lot of them, I believe) and then do a general find-and-replace throughout the whole file.
> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: > Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
> I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
> The error I get is:
> Traceback (most recent call last): > File "cpconvert.py", line 848, in <module> > main() > File "cpconvert.py", line 821, in main > version,stories,images = importStories(verbose) > File "cpconvert.py", line 722, in importStories > story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] > IndexError: list index out of range
> and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
> $%?m(?O` "<p>
> Q > A > " > ? > N<p>
> qra?\ > i > O[?HD??=?A~r > -/ ?="#?+$?+%<p>
> Any ideas?
> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
> > Hey Matt, > > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
> > Daniel
> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
> >> I'm getting this error:
> >> Traceback (most recent call last): > >> File "cpconvert.py", line 848, in <module> > >> main() > >> File "cpconvert.py", line 821, in main > >> version,stories,images = importStories(verbose) > >> File "cpconvert.py", line 679, in importStories > >> for line in storiesCSV: > >> _csv.Error: line contains NULL byte
> >> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
Did it with TextMate. Offending stories were removed from the CSV and pasted elsewhere so they can be manually re-inserted. Thank you guys so much for developing this, now that we've got the archives we'll be able to get back off of College Publisher next semester. Awesomesauce!
Now- anybody have ideas about hosting that will placate the journalism department's concerns? I offered to put up my own money for a VPS but they're not having that, and I don't want to run WordPress on the windows machines we have here.
-Matthew
On Mar 7, 2011, at 4:36 PM, Daniel Bachhuber wrote:
> Also, if you can, you may want to ignore the stories with junk data anyway and do a manual reimport. I don't think it's worth corrupting your archives, and causing troubles for the next migration, etc. with junk data
> On Mar 8, 2011, at 7:15 AM, Miles Skorpen <mi...@milesskorpen.com> wrote:
>> I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "$%?m(?O` "<p>" is the title, "Q" is the author, etc. and then the line ends before the script expects it to, causing the error. I'd find that line in your CSV and remove or replace it. Ideally, look closely and try to figure out why the CSV parser would choke on it (are there lots of commas and quotations? That's a recipe for issues, though the script should automatically escape a lot of them, I believe) and then do a general find-and-replace throughout the whole file.
>> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: >> Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
>> I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
>> The error I get is:
>> Traceback (most recent call last): >> File "cpconvert.py", line 848, in <module> >> main() >> File "cpconvert.py", line 821, in main >> version,stories,images = importStories(verbose) >> File "cpconvert.py", line 722, in importStories >> story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] >> IndexError: list index out of range
>> and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
>> $%?m(?O` "<p>
>> Q >> A >> " >> ? >> N<p>
>> qra?\ >> i >> O[?HD??=?A~r >> -/ ?="#?+$?+%<p>
>> Any ideas?
>> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
>> > Hey Matt, >> > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
>> > Daniel
>> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
>> >> I'm getting this error:
>> >> Traceback (most recent call last): >> >> File "cpconvert.py", line 848, in <module> >> >> main() >> >> File "cpconvert.py", line 821, in main >> >> version,stories,images = importStories(verbose) >> >> File "cpconvert.py", line 679, in importStories >> >> for line in storiesCSV: >> >> _csv.Error: line contains NULL byte
>> >> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
On Monday, March 7, 2011 at 4:43 PM, Matthew Gerring wrote: > Did it with TextMate. Offending stories were removed from the CSV and pasted elsewhere so they can be manually re-inserted. Thank you guys so much for developing this, now that we've got the archives we'll be able to get back off of College Publisher next semester. Awesomesauce!
> Now- anybody have ideas about hosting that will placate the journalism department's concerns? I offered to put up my own money for a VPS but they're not having that, and I don't want to run WordPress on the windows machines we have here.
> -Matthew
> On Mar 7, 2011, at 4:36 PM, Daniel Bachhuber wrote: > > Also, if you can, you may want to ignore the stories with junk data anyway and do a manual reimport. I don't think it's worth corrupting your archives, and causing troubles for the next migration, etc. with junk data
> > On Mar 8, 2011, at 7:15 AM, Miles Skorpen <mi...@milesskorpen.com> wrote:
> > > I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "$%?m(?O` "<p>" is the title, "Q" is the author, etc. and then the line ends before the script expects it to, causing the error. I'd find that line in your CSV and remove or replace it. Ideally, look closely and try to figure out why the CSV parser would choke on it (are there lots of commas and quotations? That's a recipe for issues, though the script should automatically escape a lot of them, I believe) and then do a general find-and-replace throughout the whole file.
> > > On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: > > > > Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
> > > > I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
> > > > The error I get is:
> > > > Traceback (most recent call last): > > > > File "cpconvert.py", line 848, in <module> > > > > main() > > > > File "cpconvert.py", line 821, in main > > > > version,stories,images = importStories(verbose) > > > > File "cpconvert.py", line 722, in importStories > > > > story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] > > > > IndexError: list index out of range
> > > > and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
> > > > On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
> > > > > Hey Matt, > > > > > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
> > > > > Daniel
> > > > > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
> > > > >> I'm getting this error:
> > > > >> Traceback (most recent call last): > > > > >> File "cpconvert.py", line 848, in <module> > > > > >> main() > > > > >> File "cpconvert.py", line 821, in main > > > > >> version,stories,images = importStories(verbose) > > > > >> File "cpconvert.py", line 679, in importStories > > > > >> for line in storiesCSV: > > > > >> _csv.Error: line contains NULL byte
> > > > >> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
> > > > >> -- > > > > >> You received this message because you are a part of CoPress (http://www.copress.org/). > > > > >> - To post a message to this group, send email to copress@googlegroups.com > > > > >> - To unsubscribe from this group, send an email to copress+unsubscribe@googlegroups.com > > > > >> - For more options, visit this group at http://groups.google.com/group/copress > > > > >> - Get connected on Twitter http://www.twitter.com/copress or Facebook http://www.facebook.com/copress
> > > > > -- > > > > > You received this message because you are a part of CoPress (http://www.copress.org/). > > > > > - To post a message to this group, send email to copress@googlegroups.com > > > > > - To unsubscribe from this group, send an email to copress+unsubscribe@googlegroups.com > > > > > - For more options, visit this group at http://groups.google.com/group/copress > > > > > - Get connected on Twitter http://www.twitter.com/copress or Facebook http://www.facebook.com/copress
Ouch- almost got there, but then I got a message from WordPress 3.0 that the files produced by the script are not valid WXR. Says in the file that the generator is WordPress 2.7.1.
> We had really good success with WebFaction when CoPress was running. It's also where I have my personal site hosted. Pretty solid.
> -- > Andrew Spittle | andrewspittle.net
> On Monday, March 7, 2011 at 4:43 PM, Matthew Gerring wrote:
>> Did it with TextMate. Offending stories were removed from the CSV and pasted elsewhere so they can be manually re-inserted. Thank you guys so much for developing this, now that we've got the archives we'll be able to get back off of College Publisher next semester. Awesomesauce!
>> Now- anybody have ideas about hosting that will placate the journalism department's concerns? I offered to put up my own money for a VPS but they're not having that, and I don't want to run WordPress on the windows machines we have here.
>> -Matthew
>> On Mar 7, 2011, at 4:36 PM, Daniel Bachhuber wrote:
>>> Also, if you can, you may want to ignore the stories with junk data anyway and do a manual reimport. I don't think it's worth corrupting your archives, and causing troubles for the next migration, etc. with junk data
>>> On Mar 8, 2011, at 7:15 AM, Miles Skorpen <mi...@milesskorpen.com> wrote:
>>>> I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "$%?m(?O` "<p>" is the title, "Q" is the author, etc. and then the line ends before the script expects it to, causing the error. I'd find that line in your CSV and remove or replace it. Ideally, look closely and try to figure out why the CSV parser would choke on it (are there lots of commas and quotations? That's a recipe for issues, though the script should automatically escape a lot of them, I believe) and then do a general find-and-replace throughout the whole file.
>>>> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: >>>>> Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
>>>>> I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
>>>>> The error I get is:
>>>>> Traceback (most recent call last): >>>>> File "cpconvert.py", line 848, in <module> >>>>> main() >>>>> File "cpconvert.py", line 821, in main >>>>> version,stories,images = importStories(verbose) >>>>> File "cpconvert.py", line 722, in importStories >>>>> story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] >>>>> IndexError: list index out of range
>>>>> and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
>>>>> $%?m(?O` "<p>
>>>>> Q >>>>> A >>>>> " >>>>> ? >>>>> N<p>
>>>>> qra?\ >>>>> i >>>>> O[?HD??=?A~r >>>>> -/ ?="#?+$?+%<p>
>>>>> Any ideas?
>>>>> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
>>>>> > Hey Matt, >>>>> > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
>>>>> > Daniel
>>>>> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
>>>>> >> I'm getting this error:
>>>>> >> Traceback (most recent call last): >>>>> >> File "cpconvert.py", line 848, in <module> >>>>> >> main() >>>>> >> File "cpconvert.py", line 821, in main >>>>> >> version,stories,images = importStories(verbose) >>>>> >> File "cpconvert.py", line 679, in importStories >>>>> >> for line in storiesCSV: >>>>> >> _csv.Error: line contains NULL byte
>>>>> >> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
Did it fail uploading, or just give you the error? As far as I can remember, we just arbitrarily wrote the generator number.
If it still doesn't work, do you mind doing a comparison between the file the script generated and the newer WXR files? I can help improve it this weekend or early next week
On 08 Mar 2011, at 8:49 AM, Matthew Gerring wrote:
> Ouch- almost got there, but then I got a message from WordPress 3.0 that the files produced by the script are not valid WXR. Says in the file that the generator is WordPress 2.7.1.
> On Mar 7, 2011, at 4:44 PM, Andrew Spittle wrote:
>> We had really good success with WebFaction when CoPress was running. It's also where I have my personal site hosted. Pretty solid.
>> -- >> Andrew Spittle | andrewspittle.net
>> On Monday, March 7, 2011 at 4:43 PM, Matthew Gerring wrote:
>>> Did it with TextMate. Offending stories were removed from the CSV and pasted elsewhere so they can be manually re-inserted. Thank you guys so much for developing this, now that we've got the archives we'll be able to get back off of College Publisher next semester. Awesomesauce!
>>> Now- anybody have ideas about hosting that will placate the journalism department's concerns? I offered to put up my own money for a VPS but they're not having that, and I don't want to run WordPress on the windows machines we have here.
>>> -Matthew
>>> On Mar 7, 2011, at 4:36 PM, Daniel Bachhuber wrote:
>>>> Also, if you can, you may want to ignore the stories with junk data anyway and do a manual reimport. I don't think it's worth corrupting your archives, and causing troubles for the next migration, etc. with junk data
>>>> On Mar 8, 2011, at 7:15 AM, Miles Skorpen <mi...@milesskorpen.com> wrote:
>>>>> I haven't dealt with this in well over a year, but that a story is screwing up the CSV parsing -- it thinks that "$%?m(?O` "<p>" is the title, "Q" is the author, etc. and then the line ends before the script expects it to, causing the error. I'd find that line in your CSV and remove or replace it. Ideally, look closely and try to figure out why the CSV parser would choke on it (are there lots of commas and quotations? That's a recipe for issues, though the script should automatically escape a lot of them, I believe) and then do a general find-and-replace throughout the whole file.
>>>>> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring <beatpa...@gmail.com> wrote: >>>>>> Fixed the null byte issue, but now I'm getting "list index out of range". Most of the data is normalized, but a few of the stories (the ones containing NULL bytes) have really weird characters in them, and there are newlines in the story text that might be throwing it off.
>>>>>> I checked the encoding of the CSV file and it's telling me: Non-ISO extended ASCII HTML document text, with very long lines, with CRLF line terminators.
>>>>>> The error I get is:
>>>>>> Traceback (most recent call last): >>>>>> File "cpconvert.py", line 848, in <module> >>>>>> main() >>>>>> File "cpconvert.py", line 821, in main >>>>>> version,stories,images = importStories(verbose) >>>>>> File "cpconvert.py", line 722, in importStories >>>>>> story = [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] >>>>>> IndexError: list index out of range
>>>>>> and here's what the last few lines of text in my CSV before the error (in verbose mode, answering "yes" to "would you like to run a test") looks like:
>>>>>> $%?m(?O` "<p>
>>>>>> Q >>>>>> A >>>>>> " >>>>>> ? >>>>>> N<p>
>>>>>> qra?\ >>>>>> i >>>>>> O[?HD??=?A~r >>>>>> -/ ?="#?+$?+%<p>
>>>>>> Any ideas?
>>>>>> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
>>>>>> > Hey Matt, >>>>>> > I'm not positive, going off memory, but I think you need to find and replace in the file on the null byte. You can replace it with a space or nothing
>>>>>> > Daniel
>>>>>> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring <beatpa...@gmail.com> wrote:
>>>>>> >> I'm getting this error:
>>>>>> >> Traceback (most recent call last): >>>>>> >> File "cpconvert.py", line 848, in <module> >>>>>> >> main() >>>>>> >> File "cpconvert.py", line 821, in main >>>>>> >> version,stories,images = importStories(verbose) >>>>>> >> File "cpconvert.py", line 679, in importStories >>>>>> >> for line in storiesCSV: >>>>>> >> _csv.Error: line contains NULL byte
>>>>>> >> pretty straightforward, but I don't know a whole lot about Python. Can anybody help?
>>>>>> >> -- >>>>>> >> You received this message because you are a part of CoPress (http://www.copress.org/). >>>>>> >> - To post a message to this group, send email to copress@googlegroups.com >>>>>> >> - To unsubscribe from this group, send an email to copress+unsubscribe@googlegroups.com >>>>>> >> - For more options, visit this group at http://groups.google.com/group/copress >>>>>> >> - Get connected on Twitter http://www.twitter.com/copress or Facebook http://www.facebook.com/copress
On Mar 7, 2011 7:49 PM, "Matthew Gerring" <beatpa...@gmail.com> wrote:
> Ouch- almost got there, but then I got a message from WordPress 3.0 that
the files produced by the script are not valid WXR. Says in the file that the generator is WordPress 2.7.1.
What version of the WordPress Importer plugin are you running? Be sure you're on 0.3 and preferably running PHP 5.2. If that still doesn't work, try adding this to wp-config.php, for expanded error reporting:
On Mon, Mar 7, 2011 at 4:49 PM, Matthew Gerring <beatpa...@gmail.com> wrote: > Ouch- almost got there, but then I got a message from WordPress 3.0 that > the files produced by the script are not valid WXR. Says in the file that > the generator is WordPress 2.7.1.
> On Mar 7, 2011, at 4:44 PM, Andrew Spittle wrote:
> We had really good success with WebFaction when CoPress was running. It's > also where I have my personal site hosted. Pretty solid.
> -- > Andrew Spittle | andrewspittle.net
> On Monday, March 7, 2011 at 4:43 PM, Matthew Gerring wrote:
> Did it with TextMate. Offending stories were removed from the CSV and > pasted elsewhere so they can be manually re-inserted. Thank you guys so much > for developing this, now that we've got the archives we'll be able to get > back off of College Publisher next semester. Awesomesauce!
> Now- anybody have ideas about hosting that will placate the journalism > department's concerns? I offered to put up my own money for a VPS but > they're not having that, and I don't want to run WordPress on the windows > machines we have here.
> -Matthew
> On Mar 7, 2011, at 4:36 PM, Daniel Bachhuber wrote:
> Also, if you can, you may want to ignore the stories with junk data anyway > and do a manual reimport. I don't think it's worth corrupting your archives, > and causing troubles for the next migration, etc. with junk data
> On Mar 8, 2011, at 7:15 AM, Miles Skorpen <mi...@milesskorpen.com> wrote:
> I haven't dealt with this in well over a year, but that a story is screwing > up the CSV parsing -- it thinks that "$%?m(?O` "<p>" is the title, > "Q" is the author, etc. and then the line ends before the script expects it > to, causing the error. I'd find that line in your CSV and remove or replace > it. Ideally, look closely and try to figure out why the CSV parser would > choke on it (are there lots of commas and quotations? That's a recipe for > issues, though the script should automatically escape a lot of them, I > believe) and then do a general find-and-replace throughout the whole file.
> On Mon, Mar 7, 2011 at 3:11 PM, Matthew Gerring < <beatpa...@gmail.com> > beatpa...@gmail.com> wrote:
> Fixed the null byte issue, but now I'm getting "list index out of range". > Most of the data is normalized, but a few of the stories (the ones > containing NULL bytes) have really weird characters in them, and there are > newlines in the story text that might be throwing it off.
> I checked the encoding of the CSV file and it's telling me: Non-ISO > extended ASCII HTML document text, with very long lines, with CRLF line > terminators.
> The error I get is:
> Traceback (most recent call last): > File "cpconvert.py", line 848, in <module> > main() > File "cpconvert.py", line 821, in main > version,stories,images = importStories(verbose) > File "cpconvert.py", line 722, in importStories > story = > [line[0],line[2],line[3],line[4],line[6],line[7],line[8],line[5]] > IndexError: list index out of range
> and here's what the last few lines of text in my CSV before the error (in > verbose mode, answering "yes" to "would you like to run a test") looks like:
> $%?m(?O` "<p>
> Q > A > " > ? > N<p>
> qra?\ > i > O[?HD??=?A~r > -/ ?="#?+$?+%<p>
> Any ideas?
> On Mar 7, 2011, at 2:57 PM, Daniel Bachhuber wrote:
> > Hey Matt, > > I'm not positive, going off memory, but I think you need to find and > replace in the file on the null byte. You can replace it with a space or > nothing
> > Daniel
> > On Mar 8, 2011, at 4:23 AM, Matthew Gerring < <beatpa...@gmail.com> > beatpa...@gmail.com> wrote:
> >> I'm getting this error:
> >> Traceback (most recent call last): > >> File "cpconvert.py", line 848, in <module> > >> main() > >> File "cpconvert.py", line 821, in main > >> version,stories,images = importStories(verbose) > >> File "cpconvert.py", line 679, in importStories > >> for line in storiesCSV: > >> _csv.Error: line contains NULL byte
> >> pretty straightforward, but I don't know a whole lot about Python. Can > anybody help?
Also, forgot this, but two hosting options I'd consider if you have any budget: http://page.ly/ and http://wpengine.com/ It's WordPress-specific support, so I think better suited for student publications
> On Mar 7, 2011 7:49 PM, "Matthew Gerring" <beatpa...@gmail.com> wrote:
> > Ouch- almost got there, but then I got a message from WordPress 3.0 that the files produced by the script are not valid WXR. Says in the file that the generator is WordPress 2.7.1.
> What version of the WordPress Importer plugin are you running? Be sure you're on 0.3 and preferably running PHP 5.2. If that still doesn't work, try adding this to wp-config.php, for expanded error reporting:
> Also, forgot this, but two hosting options I'd consider if you have any budget: http://page.ly/ and http://wpengine.com/ It's WordPress-specific support, so I think better suited for student publications
> On 08 Mar 2011, at 8:54 AM, Andrew Nacin wrote:
>> On Mar 7, 2011 7:49 PM, "Matthew Gerring" <beatpa...@gmail.com> wrote:
>> > Ouch- almost got there, but then I got a message from WordPress 3.0 that the files produced by the script are not valid WXR. Says in the file that the generator is WordPress 2.7.1.
>> What version of the WordPress Importer plugin are you running? Be sure you're on 0.3 and preferably running PHP 5.2. If that still doesn't work, try adding this to wp-config.php, for expanded error reporting:
On Mon, Mar 7, 2011 at 5:08 PM, Matthew Gerring <beatpa...@gmail.com> wrote: > Great news- upgrading to WordPress 3.1 fixed it. Also, Daniel, you have a > user account in this database. Such a small world!
> On Mar 7, 2011, at 4:57 PM, Daniel Bachhuber wrote:
> Also, forgot this, but two hosting options I'd consider if you have any > budget: http://page.ly/ and http://wpengine.com/ It's WordPress-specific > support, so I think better suited for student publications
> On 08 Mar 2011, at 8:54 AM, Andrew Nacin wrote:
> On Mar 7, 2011 7:49 PM, "Matthew Gerring" <beatpa...@gmail.com> wrote:
> > Ouch- almost got there, but then I got a message from WordPress 3.0 that > the files produced by the script are not valid WXR. Says in the file that > the generator is WordPress 2.7.1.
> What version of the WordPress Importer plugin are you running? Be sure > you're on 0.3 and preferably running PHP 5.2. If that still doesn't work, > try adding this to wp-config.php, for expanded error reporting: