possible sailfish overheat bug

234 views
Skip to first unread message

Rich Olson

unread,
May 29, 2013, 7:43:58 PM5/29/13
to jetty-f...@googlegroups.com
Running a Replicator 2 with Sailfish 7.4 r1062.

First off - I really like sailfish - not looking to alarm anyone - but I may have found an issue...  (or maybe I'm doing something stupid I haven't figured out)

Was doing a print the other day - and a few minutes after it completed I found the room stinking of overheated nylon (which I was printing in).  I shut the printer down - then pushed some PLA through the hot extruder to clean it out - which was so hot - it resulted in smoke!

(At this point I got out of the room)

Things seemed OK after the printer cooled down.  A test print came out fine.

Had initially suspected a intermittently bad thermocouple - but now suspect a bug in Sailfish.

The next day I retried the same print - and found that after the print completed - the printer started HEATING to 290C!

See video: http://www.youtube.com/watch?v=kqyIb3Q0JOg

I scoured the gcode for any hints why it might have done this - no dice.

The good news: This problem replicates reliably for me.

The problem seems highly specific to the gcode - I have had mixed luck trying to replicate after reslicing.

I have not been able to replicating the issue slicing another object.

I originally sliced this using a custom profile on ReplicatorG / Skeinforge 0040r5.  I don't think the profile is relevant.

To replicate:

- Download http://www.nothinglabs.com/overheat.zip
- Open chainring.stl file in ReplicatorG 00040r20 (this will also load chainring.gcode)
- Print! (don't reslice)
    (The print is at 255c - and is very small - only takes about 3 minutes.)
- Immediately after print completes - bring up the status monitor.  If the problem replicates for you - it'll be heating to 290c!
- Panic a little - and turn off printer.

This has replicated very reliably for me with this -exact- file.  I just tested after adding a single comment line to the file - and it didn't replicate.  Weird (but maybe a clue).

This problem does not replicate using the stock Makerbot 7.2 firmware.  I have switched between sailfish and stock firmware several times.  Issue seems unique to Sailfish.

Override GCode Temp was off.  Pre-heat was only 245c.

It seems like if you click "OK" in Replicator G's "Print Done" dialog - it stops the heating cycle.

Had it not repro for me once - but then rebooted the printer - and to repro'd again.

Anyone see anything in the gocde that looks funky?

Anyone ever seen anything similar?  Thoughts?

Thanks,

-Rich

Bottleworks

unread,
May 29, 2013, 9:54:52 PM5/29/13
to jetty-f...@googlegroups.com
I had something similar happen to me before on a rep1. It appeared to be related to a misread of the SD card. It only occurred once, but was quite scary.

Jetty

unread,
May 29, 2013, 10:02:30 PM5/29/13
to jetty-f...@googlegroups.com
Can you upload your .x3g file that you're printing that's causing the issue?

Dan Newman

unread,
May 29, 2013, 10:17:35 PM5/29/13
to jetty-f...@googlegroups.com

On 29 May 2013 , at 4:05 PM, Rich Olson wrote:

> Running a Replicator 2 with Sailfish 7.4 r1062.

1. The gcode was generated by RepG 40r5 - Sailfish? That rev *never* supported
Sailfish on the Rep 2. Please upgrade to a version of RepG 40r - Sailfish or later.

2. I cannot reproduce this on my Rep 2 with r1062 *when* I print from SD card. I can
reproduce it when I print directly from RepG 40r20 - Sailfish. Thus appears to be a
RepG bug. Indeed, if you change the temp to 254 the problem goes away. Part of the
issue may well be that 255 is a magic temp to RepG -- it's a HBP gone wild. Other
code paths in RepG may also treat 255 in an odd way.

Oh and if I print from RepG and make the temp 257 other bad things happen. Again,
only from RepG. Not from SD card.

We'll look into this. However, chalk this up to yet another reason why people
constantly say to print from SD card. And you definitely do not want to print
from RepG unattended: the comms could go flaky and send a bad instruction like
heat to a high temp.

BTW, you probably don't see this with MBI's 7.2 firmware and RepG since MBI's 7.2
and earlier firmware has a bug and does incorrect things at temps > 240C. Sailfish
had inherited that bug from MBI but we spotted it, pointed it out to MBI, and fixed
it in r1062. MBI looks to be fixing it in their upcoming release.

Dan

Dan Newman

unread,
May 29, 2013, 10:32:11 PM5/29/13
to jetty-f...@googlegroups.com
Also, FWIW,

1. It looks like RepG, confused by the 255, is sending some large temp which
Sailfish is limiting to 290. In prior releases, Sailfish had a lower max
limit. But as folks want to play with polycarbonate, they need higher temps.

2. MBI still has a lower max limit, but I believe they will be increasing theirs
as well. This is another possible reason why you don't see this from RepG with
MBI's current 7.2.

3. When RepG did this, I could see over on the bot that the new setpoint was 290C.
That's why I believe that RepG is sending a large value which Sailfish is limiting.

Dan

Dan Newman

unread,
May 29, 2013, 11:44:04 PM5/29/13
to jetty-f...@googlegroups.com
Sometimes this happens from RepG for me, sometimes it doesn't. It never
happens from SD card. Moreover, when it does happen to me from RepG, I
also see "packet timed out" errors from RepG. Indeed, I see them in
conjunction with the two G0 "rapid motion" commands. When the bot is
doing rapid motion is one of the times that USB comms is flaky as the
bot is a bit too busy to be doing USB comms. And, you can get
incorrectly interpreted commands. Your gcode has some special traits

1. There's a G0 Z150 to rapidly move the build platform to the bottom
after the build is done.

2. Very shortly after that, a command to set the extruder temp to 0 is
sent.

3. Since the print is less than 1mm tall, the build plate runs a longer
distance than normal. Seems to be just far enough at high speed to
cause occasional USB comms problems.

I'm suspecting that 3 is the issue: there's a USB comms problem and
the set temp command is getting garbled and a high temp read over the
garbled USB. Now that Sailfish has moved from a temp limit of 260C to
290C this is very noticeable. Note that there is an idle timer which
will turn the extruder off after 30 minutes, but 290C is a bit high.

So, in the short term you can try changing the G0's to G1's and that
may help. Also, printing unattended over USB is never a good idea
owing to the USB comms issues.

Dan

Dan Newman

unread,
May 29, 2013, 11:47:52 PM5/29/13
to jetty-f...@googlegroups.com
Jetty and I will look into some possible safety measures here.

Presently, the Replicators behave like the ToMs where if the
final gcode did not turn the temp off, then the heater is left
on. A nice way to have things "pre heated" for the next print.
We could make the "M73 P100" force all heaters to 0 and then
update all the end gcode files to have that after the final
M104 command. Then there would have to be both a garbled M104
command and a garbled M73 command.

Dan

Dan Newman

unread,
May 30, 2013, 1:49:24 AM5/30/13
to jetty-f...@googlegroups.com
> Anyone see anything in the gocde that looks funky?

There is one odd thing with your gcode, the start.gcode
has the line

G0 X-145 Y-75 Z30 (move to waiting position off build plate)

I'm unsure of where that line came from. To the best of my knowledge
we've never distributed a start gcode file for the Rep 1 or Rep 2 with
a fast G0 move for the move to the waiting position. Using a G0 there
is not a good idea. Among other things, when you are printing over USB
it can cause the stepper motor currents to be set to incorrect values
while things are heating up. That really fast, long move will interfere with
USB comms and the immediately subsequent G130 command may get
garbled and one or more stepper drivers set to max current while
the bot is drawing power to heat up.

So please change that G0 to G1.

Dan



Rich Olson

unread,
May 30, 2013, 4:32:28 AM5/30/13
to jetty-f...@googlegroups.com
To confirm - yes I was seeing "Packet timed out!" - sometimes several but it always seemed to be at the beginning of the print.

I tried swapping out the waiting position move from:


G0 X-145 Y-75 Z30 (move to waiting position off build plate)

To what 40r20 was using:

G1 X-141 Y-74 Z50 F3300.0 (move to waiting position)

This got rid of the "Packet timed out!" - but did -not- eliminate the overheating issue.

Manually set the temperature in the gcode to 245 - and the problem still replicated.  This issue does not seem specific to the number 255.  Tested this a few times to be sure (confirmed the printer was heating to 245c during the print).

I'm now fully aware that I generated the gcode in question with a non-supported version of ReplicatorG (I had thought I'd updated a while ago - but apparently hadn't).  Sorry for any confusion that caused.

I was also not able to repro using SD card.  I've attached a .x3g file reflecting the above gcode changes.  But again - it works fine when printing from SD.

I'll keep playing with things on this end - will report on any findings.

Thanks to everyone for looking into this.  Let me know if I can help in any other way.

-Rich
chainring.x3g

Rich Olson

unread,
May 30, 2013, 5:32:16 AM5/30/13
to jetty-f...@googlegroups.com
Few more things.

Changed: G0 Z150 ( Send Z axis to bottom of machine )

To: G1 Z150 ( Send Z axis to bottom of machine )

Problem still occured.

However - when I simply removed the line - I was not able to reproduce the problem.

(Then put the line back-in restarted fresh with the original "bad" version of the gcode)

At the end of the file - I noticed there were duplicate places where the extruder was told to cool-down:

M104 S0
;M113 S0.0
M127
(******* End.gcode*******)
M73 P100 ( End  build progress )
G0 Z150 ( Send Z axis to bottom of machine )
M18 ( Disable steppers )
M104 S0 T0 ( Cool down the Right Extruder )

So - I tried removing:
M104 S0 T0 ( Cool down the Right Extruder )

And I was again -unable- to replicate the problem.  Seems like the first cool-down instrucation was getting handled properly - but the second one was getting buggered up?

Dan Newman

unread,
May 30, 2013, 10:07:39 AM5/30/13
to jetty-f...@googlegroups.com

On 30 May 2013 , at 1:32 AM, Rich Olson wrote:

> To confirm - yes I was seeing "Packet timed out!" - sometimes several but
> it always seemed to be at the beginning of the print.
>
> I tried swapping out the waiting position move from:
>
> G0 X-145 Y-75 Z30 (move to waiting position off build plate)
>
> To what 40r20 was using:
>
> G1 X-141 Y-74 Z50 F3300.0 (move to waiting position)
>
> This got rid of the "Packet timed out!" - but did -not- eliminate the
> overheating issue.

It's not expected to. It's more garbled comms at the end of the print.

Look, people have been saying for years to NOT PRINT OVER USB owing
to issues with garbled comms. This is what you are seeing.

> Manually set the temperature in the gcode to 245 - and the problem still
> replicated. This issue does not seem specific to the number 255.

Correct. It worked once or twice for me at 254 because the garbled
comms in conjunction with the "G0 Z150" doesn't always happen.

> Tested
> this a few times to be sure (confirmed the printer was heating to 245c
> during the print).
>
> I'm now fully aware that I generated the gcode in question with a
> non-supported version of ReplicatorG (I had thought I'd updated a while ago
> - but apparently hadn't). Sorry for any confusion that caused.
>
> I was also not able to repro using SD card. I've attached a .x3g file
> reflecting the above gcode changes. But again - it works fine when
> printing from SD.
>
> I'll keep playing with things on this end - will report on any findings.
>
> Thanks to everyone for looking into this. Let me know if I can help in any
> other way.

From all appearances your issue is simply USB garbled comms. This is why
folks say to not print over USB. It's likely that "G0 Z150" combined with
the fact that the build plate has to move nearly ALL of that Z150 since the
print is so small in Z height.

While we can take some steps to reduce the likelihood of this happening,
there already is the community tested and approved way of dealing with this:
DO NOT PRINT OVER USB.

Dan

Dan Newman

unread,
May 30, 2013, 10:11:13 AM5/30/13
to jetty-f...@googlegroups.com

On 30 May 2013 , at 2:32 AM, Rich Olson wrote:

> Few more things.
>
> Changed: G0 Z150 ( Send Z axis to bottom of machine )
>
> To: G1 Z150 ( Send Z axis to bottom of machine )
>
> Problem still occured.

Not at all surprising: it's a long move and Z quickly gets
to the top speed anyway. A "G1 Z50" would be more likely
to reduce likelihood of the issue.

> However - when I simply removed the line - I was not able to reproduce the
> problem.

That further supports "garbled USB comms".

> (Then put the line back-in restarted fresh with the original "bad" version
> of the gcode)
>
> At the end of the file - I noticed there were duplicate places where the
> extruder was told to cool-down:

That's normal.
>
> M104 S0
> ;M113 S0.0
> M127
> (******* End.gcode*******)
> M73 P100 ( End build progress )
> G0 Z150 ( Send Z axis to bottom of machine )
> M18 ( Disable steppers )
> M104 S0 T0 ( Cool down the Right Extruder )
>
> So - I tried removing:
> M104 S0 T0 ( Cool down the Right Extruder )
>
> And I was again -unable- to replicate the problem. Seems like the first
> cool-down instrucation was getting handled properly - but the second one
> was getting buggered up?

That further supports garbled USB comms as that last one is sent to the bot when
it's busy with that long, fast move.

Dan


Dan Newman

unread,
May 30, 2013, 11:07:05 AM5/30/13
to jetty-f...@googlegroups.com
>> However - when I simply removed the line - I was not able to reproduce the
>> problem.
>
> That further supports "garbled USB comms".

BTW, USB comms between the host computer and USB chip in the bot should be
reliable (assuming a good implementation on the USB chip). USB has error
correction and packet retransmit.

However, between the bot's USB comms chip and the Sailfish processor, it's
just serial comms over a UART at high speed. When the Sailfish processor
is very busy at interrupt level running the stepper motors, the Sailfish
processor can be sluggish in acknowledging or otherwise responding to s3g
packets. That's the "packet timeout" you see in RepG. Additionally, it's
possible for data to be lost or corrupted. My guess is that bits may be
getting lost. And, I've always suspected that how the MBI firmware handles
this (and thus Sailfish as well) may be suboptimal as I do not hear about
this problem as much with other RepRap firmwares. It's something I generally
only read about with makerbots and goes back all the way to the Cupcake
days. Sailfish has inherited this code from the mainline MBI firmwares.
And faster motion exacerbates the problem as the bot is spending more time
at interrupt level driving the stepper motors. Especially so for longer
moves as they easily reach the top speeds.

Dan

Robert Trescott

unread,
May 30, 2013, 3:07:34 PM5/30/13
to jetty-f...@googlegroups.com, Dan Newman
Hi Dan,
I agree with your USB analysis and understand the likelihood of lost or missing data via serial communications. I would like to make two suggestions.

-First, I know it's a real pain to 'fix' something that originated from MBI firmware, but seriously, I'd give up many of the new features added to Sailfish, if I could have proper error detected/corrected printing via USB. Every 2D printer can do it, CNC mills, scanners, keyboards, mice, thumbdrives, etc... all negotiate the electrical soup out there waiting to corrupt serial data just because it can!
Even, if the machine could only reliably print at 60mm/min while using USB, that's a limitation that I could respect. However, I suspect there is a deeper bug in the firmware. I see packets lost all the time with or without the machine even moving any axis. I don't see a lot of motion interrupts colliding with my comms interrupts in that scenario. Something is definitely amiss somewhere in the comms firmware and perhaps a different strategy is in order. Circular rx/tx buffers, software STX/ETX, CRC16 error detect, etc... stuff you already know about I'm sure, but something different that what we have now. (Isn't the SD card a serial data xfer?)

-Second, if we can't expect USB comms to behave reliably during printing, perhaps it's just safer to just disable that feature. That way instead of saying "you shouldn't print with USB" it's "you can't print with USB".

Just thought I'd weigh in on the conversation since this affects everyone using these machines.

Regards,
-Robert

 



Dan Newman

unread,
May 30, 2013, 3:32:51 PM5/30/13
to jetty-f...@googlegroups.com

On 30 May 2013 , at 12:07 PM, Robert Trescott wrote:

> Hi Dan,
> I agree with your USB analysis and understand the likelihood of lost or missing data via serial communications. I would like to make two suggestions.
>
> -First, I know it's a real pain to 'fix' something that originated from MBI firmware, but seriously, I'd give up many of the new features added to Sailfish, if I could have proper error detected/corrected printing via USB. Every 2D printer can do it, CNC mills, scanners, keyboards, mice, thumbdrives, etc... all negotiate the electrical soup out there waiting to corrupt serial data just because it can!
> Even, if the machine could only reliably print at 60mm/min while using USB, that's a limitation that I could respect. However, I suspect there is a deeper bug in the firmware. I see packets lost all the time with or without the machine even moving any axis. I don't see a lot of motion interrupts colliding with my comms interrupts in that scenario. Something is definitely amiss somewhere in the comms firmware and perhaps a different strategy is in order. Circular rx/tx buffers, software STX/ETX, CRC16 error detect, etc... stuff you already know about I'm sure, but something different that what we have now. (Isn't the SD card a serial data xfer?)
>
> -Second, if we can't expect USB comms to behave reliably during printing, perhaps it's just safer to just disable that feature. That way instead of saying "you shouldn't print with USB" it's "you can't print with USB".
>
> Just thought I'd weigh in on the conversation since this affects everyone using these machines.

Well, Jetty plans to look at the RepG side. I've identified at least two ways that the
inbound comms to the ATmega 1280/2560/whatever will silently drop inbound bytes. Now
that eventually should cause a CRC error unless it's an entire s3g command lost. So,
then, the question becomes how well is RepG handling such an error response back? It
might not be.

The most likely way bytes get dropped is when the non-interrupt "host" slice isn't
getting enough cycles to process a completed s3g command packet and to "reset" the
packet input buffer. While the last s3g command packet is completed (in state "PS_LAST")
and before it is processed, additional incoming bytes off of the UART are silently dropped
on the floor. The UART code is interrupt driven and so it can pre-empt processing
in the host slice. Ditto the very busy stepper interrupt. My guess is that the
error handling in RepG is lacking. (We already know it is when writing files to SD card.)

Dan

Jetguy

unread,
May 30, 2013, 3:47:07 PM5/30/13
to Jetty Firmware
My only response is, I hear you and agree with the points but that
simply won't fly.
It's really easy to sit back and armchair how you'd do it and all the
features that should be there.
I cannot even remotely take credit to know what Dan and Jetty went
through to get where we are today. These guys have put in countless
hours and solved hundreds of major bugs the real paid software/
firmware team hasn't even touched.

Face it, people are stupid. Masses of people are really dumb. 3D
printer owners posting in the forum, dumbest people on earth.
So if you tell these people you can't print from USB, they simply
won't use the firmware. Keep in mind, the same people cannot follow
the directions to do a proper reset of onboard settings to at least
get to a known good config.

The final kicker is, we are OUT OF SPACE. MakerBot chose to save a few
pennies and use 1280s instead of 2560s. We are simply out of codespace
since the demand for things like:
Long filenames
Large SD card support
Lengthy text menus
Stupid stuff like Digipots (a complete side discussion on that
disaster)

So feature requests like buffers and such really aren't going to
happen.
Unless someone wants to do an entire rewrite of the firmware and
really, the entire system all the way to Rep-G or whatever slicer,
extremely unlikely.
I'm sure these guys will try to solve some of these and of course
whatever safety aspects but you are barking up the wrong tree really.
Dan and Jetty are volonteers, not paid staff. This massive bug really
needs thrown in MakerBot's face. They are the one selling the bot.
They are the one resting on some really old firmware and some very
poor software development.

Again, you have valid complaints and I'm not disputing that, but also,
just trying to let you know what we collectively are up against.

Dan Newman

unread,
May 30, 2013, 4:30:17 PM5/30/13
to jetty-f...@googlegroups.com

On 30 May 2013 , at 12:32 PM, Dan Newman wrote:

>
> On 30 May 2013 , at 12:07 PM, Robert Trescott wrote:
>
>> Hi Dan,
>> I agree with your USB analysis and understand the likelihood of lost or missing data via serial communications. I would like to make two suggestions.
>>
>> -First, I know it's a real pain to 'fix' something that originated from MBI firmware, but seriously, I'd give up many of the new features added to Sailfish, if I could have proper error detected/corrected printing via USB. Every 2D printer can do it, CNC mills, scanners, keyboards, mice, thumbdrives, etc... all negotiate the electrical soup out there waiting to corrupt serial data just because it can!
>> Even, if the machine could only reliably print at 60mm/min while using USB, that's a limitation that I could respect. However, I suspect there is a deeper bug in the firmware. I see packets lost all the time with or without the machine even moving any axis. I don't see a lot of motion interrupts colliding with my comms interrupts in that scenario. Something is definitely amiss somewhere in the comms firmware and perhaps a different strategy is in order. Circular rx/tx buffers, software STX/ETX, CRC16 error detect, etc... stuff you already know about I'm sure, but something different that what we have now. (Isn't the SD card a serial data xfer?)
>>
>> -Second, if we can't expect USB comms to behave reliably during printing, perhaps it's just safer to just disable that feature. That way instead of saying "you shouldn't print with USB" it's "you can't print with USB".
>>
>> Just thought I'd weigh in on the conversation since this affects everyone using these machines.
>
> Well, Jetty plans to look at the RepG side. I've identified at least two ways that the
> inbound comms to the ATmega 1280/2560/whatever will silently drop inbound bytes. Now
> that eventually should cause a CRC error unless it's an entire s3g command lost.

At which point -- entire s3g command lost -- that packet would not get ack'd and a different
error should ensue.

And, I'll mention, that the CRC used is the 8bit DOW CRC (aka, Dallas Semi 1-wire CRC).
It's not a CRC to get overly excited about -- it does miss errors. As can most any
CRC. Question is just one of statistics and where your comfort level is.

Anyway, as I wrote, there's an issue of how well RepG is handling error responses.

But, I'm personally in favor of making the "Build" button pop up a window indicating
that builds over USB should not be left unattended. Also, I've made some changes
such that on Replicators once the "build end notification" is received (M73 P100),
all heaters are unconditionally turned off. That also requires an update to all
the end.gcode files. But in this case, Richard had an unusual start gcode -- his
had G0 for the "move to wait position". We've never released rep2 start gcode with
that. For users running with their own start or end gcode, new safety measures may
go unseen.

Dan

Dan Newman

unread,
May 30, 2013, 6:13:21 PM5/30/13
to jetty-f...@googlegroups.com

On 30 May 2013 , at 12:32 PM, Dan Newman wrote:

>
> On 30 May 2013 , at 12:07 PM, Robert Trescott wrote:
>
>> Hi Dan,
>> I agree with your USB analysis and understand the likelihood of lost or missing data via serial communications. I would like to make two suggestions.
>>
>> -First, I know it's a real pain to 'fix' something that originated from MBI firmware, but seriously, I'd give up many of the new features added to Sailfish, if I could have proper error detected/corrected printing via USB. Every 2D printer can do it, CNC mills, scanners, keyboards, mice, thumbdrives, etc... all negotiate the electrical soup out there waiting to corrupt serial data just because it can!
>> Even, if the machine could only reliably print at 60mm/min while using USB, that's a limitation that I could respect. However, I suspect there is a deeper bug in the firmware. I see packets lost all the time with or without the machine even moving any axis. I don't see a lot of motion interrupts colliding with my comms interrupts in that scenario. Something is definitely amiss somewhere in the comms firmware and perhaps a different strategy is in order. Circular rx/tx buffers, software STX/ETX, CRC16 error detect, etc... stuff you already know about I'm sure, but something different that what we have now. (Isn't the SD card a serial data xfer?)
>>
>> -Second, if we can't expect USB comms to behave reliably during printing, perhaps it's just safer to just disable that feature. That way instead of saying "you shouldn't print with USB" it's "you can't print with USB".
>>
>> Just thought I'd weigh in on the conversation since this affects everyone using these machines.
>
> Well, Jetty plans to look at the RepG side. I've identified at least two ways that the
> inbound comms to the ATmega 1280/2560/whatever will silently drop inbound bytes. Now
> that eventually should cause a CRC error unless it's an entire s3g command lost.

Well, looks like Sailfish inherited this from the MBI firmware,

if (in.hasError())
{
// Reset packet quickly and start handling the next packet.

/* out.reset();

// Report error code.
switch (in.getErrorCode()){
case PacketError::PACKET_TIMEOUT:
out.append8(RC_PACKET_TIMEOUT);
break;
case PacketError::BAD_CRC:
out.append8(RC_CRC_MISMATCH);
break;
case PacketError::EXCEEDED_MAX_LENGTH:
out.append8(RC_PACKET_LENGTH);
break;
default:
out.append8(RC_PACKET_ERROR);
break;
}
*/
in.reset();
//UART::getHostUART().beginSend();
//Motherboard::getBoard().indicateError(ERR_HOST_PACKET_MISC);

In case you don't speak C/C++ that's the code to handle an error on an inbound
packet from USB. It's commented out. If there's an error, no response is
sent back to RepG and the input state is reset.

So that right there is a big problem: you're printing and, say, the M6 command
gets lost and so the bot starts trying to extrude plastic with luke warm
extruders. Etc.

Dan

Robert Trescott

unread,
May 30, 2013, 6:56:21 PM5/30/13
to jetty-f...@googlegroups.com, Dan Newman
Dan and all,
I see the code is really turning a blind eye to any input errors, but I also understand that sometimes knowing an error that occurred, gets me nowhere unless I can do something about it! The key is in how to acknowledge and respond to said errors. Same thing can be said about CRC check words. I use CRC16 which is a MODBUS standard and it is rather robust, but again what can we do if a CRC error is flagged?

From some other responses I've read, I'd like to emphasize that I'm not disparaging the stellar contribution you all have put into this effort, I'm grateful there are intelligent solutions oriented people such as yourselves out there contributing selflessly for our common good.  

What I've learned by following this forum, the more time you all spend on this project, the better it gets!
Cheers & Thanks,
-Robert


Jetguy

unread,
May 30, 2013, 7:13:27 PM5/30/13
to Jetty Firmware
Robert, sorry if I came off harsh, there was a lot of sarcasm in my
last post.

I know you have the best intentions of getting this fixed and a high
technical understanding of the problem. It was just my concern with
where you were going.
I'd love to see an overhaul of a lot of this myself. I've personally
asked for a few features and I know they were a pain because of space
and code issues.
Again I know you had the best of intentions and were just airing your
thoughts and it's great you have ideas and suggestions to make it
better.

I just hope you are starting to see the magnitude of the problem at
hand. It's a huge mess of bad logic and bad concepts, mashed over
time, and warmed over with a bunch of patches. I'd say at least once a
week I get to see some insight into some new disaster in the code
uncovered. I also hope you see that the Sailfish team has done more to
correct bugs than MakerBot has ever dreamed about.
I feel sorry for those who don't run Sailfish. Stock firmware is very
worrisom place to be in in my mind. We really are in the stone ages as
some of this code has got to be over 3 years old if not longer. Some
parts have been warmed over and half baked so many times it's really
bad. Race conditions, parallel logic, all sorts of just bad code. But
like a sweater, when you pull on that string, the whole thing can fall
apart. Imagine doing that in your spare time. So you find some huge
bug, you fix it, go into testing only to find out the fix uncovers
some other huge bug. That happens daily to Dan and Jetty.

Douglas Meyer

unread,
May 30, 2013, 7:25:22 PM5/30/13
to jetty-f...@googlegroups.com
As a <retired> software developer who has often had to support ancient <VB, C> code, I can only echo Jetguy's comments. Every time I read one of the profuse posts, each full of experience, skill and knowledge (almost forgot lore), I wonder to myself "How can these guys do this? Do the other people in this community really understand and appreciate the service they're providing?" and then I think: "I'm glad I'm not in that bag. I would have been burned out after a month or two".

BRAVO!!!


--
You received this message because you are subscribed to the Google Groups "Jetty Firmware" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jetty-firmwar...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



Dan Newman

unread,
May 30, 2013, 7:27:24 PM5/30/13
to jetty-f...@googlegroups.com

On 30 May 2013 , at 3:56 PM, Robert Trescott wrote:

> Dan and all,
> I see the code is really turning a blind eye to any input errors, but I also understand that sometimes knowing an error that occurred, gets me nowhere unless I can do something about it!

Yup. And as I often have to explain to people, one of the harder
aspects about programming is dealing with all the errors. Preferrably
correctly.

> The key is in how to acknowledge and respond to said errors. Same thing can be said about CRC check words. I use CRC16 which is a MODBUS standard and it is rather robust, but again what can we do if a CRC error is flagged?

Well, I think most of us will at least agree that errors should not be ignored
in this setting. And it's easy to now see what some of the printing errors are
while printing over USB. Dropped line segments, for example may well be some of
the print quality issues people see from time to time when printing over USB.
At least the extruder steps are sent as relative steps. Otherwise, the next good
step would get all the plastic of the prior dropped ones! But then there's the
commands we don't want dropped and which pose serious problems…. Yeah, this would
go over well on a big, 10 ton CNC mill with an expensive array of tools it can
wield.

So the simplest, but not goal oriented, approach might be to have the firmware
abort the print and just send nasty-grams back to RepG until RepG takes a
hint. I write "not goal oriented" since at the end of the day, we just
want to print and throwing in the towel the first time an error occurs
is counter to that goal. At any rate, Jetty is planning on reviewing the RepG
code in a day or so so we'll see what he turns up.

An issue with the RepG code is that error handling isn't centralized.
The packet is sent and a response duly noted. However, the response
is not acted upon by the central code which sends the packet and reads
the response. Handling the error is left up to higher level code.
In general, that makes sense and I'm sure you know why. But, it also
opens the door to uneven handling of errors. Some callers may even
be ignoring the errors. My guess is that the temp setting code does
something and that something involves sending back a now incorrect
temp. Yes, might make sense to have the errors which just require
a re-transmit to auto-magically be handled with a retry count down
in the bowels. I'm sure Jetty will come up with something.

And thanks for the kind words and support. It is appreciated!

Dan

Dan Newman

unread,
May 30, 2013, 7:38:53 PM5/30/13
to jetty-f...@googlegroups.com
> My guess is that the temp setting code does
> something and that something involves sending back a now incorrect
> temp.

But Dan, if no errors are sent back from the firmware then why would the
temp setting code do any "sending back"? It may not. But when RepG
doesn't get a response back after sending a packet, I think it considers
that an implied error.

Dan

Jetty

unread,
May 31, 2013, 10:53:22 AM5/31/13
to jetty-f...@googlegroups.com
A few additional points on this discussion:

1. Aside from USB corruption issue, we have long recommended (as have many), to print over SD Card.
Mainly because it will effect print quality if you print over USB at Sailfish speeds.  With the print head
flying around at 120mm/s and the extra processing required for acceleration to achieve that speed, USB
comms is a massive overhead.  The stepper interrupt runs with the highest priority we can give it, because
we know that if it doesn't get priority, then print quality suffers.   Although USB corruption will likely contribute to
print quality reduction when printing over USB, testing we've done independent of that has shown us that USB
comms is costly, and print quality will suffer.   Really, don't print over USB, it's a bad idea until we get
a faster processor, and it's something you should really sacrifice if you want to print at these speeds.  Ultimately
you can't keep the acceleration buffer full, and it needs to be to get that performance.

2. I'm in favor, like Dan of printing a popup warning when printing over USB is attempted.
However disabling it completely, I believe that people should make
their own decision on that one, I can see cases (e.g. Calibration), where someone might not want to, and
after all, things like setting acceleration settings etc. are done over USB (but generally these things aren't done
when running at 120mm/s, so the issue is less).  So USB comms will be staying.

3. It's correct to say that adding buffers and crc's will solve the issue, and they would.  However
there are serious resource limitations.  Currently we're CPU bound (these are 8 bit 16MHz processors with no floating point
hardware), we're code space bound (128K) and Ram Bound (8K).  For example, I'd love to increase the acceleration buffer
from 16 commands to 32 commands, but we can't, it wouldn't run in the 8K anymore.

We've literally used every trick in the book from Fixed Floating Point to Assembler to interrupts that can pre-empt themselves and
have code to handle that to get speed ups and compile time options that change the way the call stack is managed to eek
back space.  We're at the point now where claiming back 200 bytes in code space, is a big deal for us.  E.G. I put some DigiPot
verification code in there the other day, that might need to come out in future because the cost (around 200 bytes) outweighs the
potential benefit of having the DigiPots written correctly, because something more important may come along that needs it more.

4. These are all design decisions, none of them are right or wrong.  We outweigh pros/cons on everything, between us and a
few active community members to try and find the best solution.  Where we can't really make our minds up on it, because pros/cons
are equal, we'll post to this forum to garner a consensus in opinion.

However that comes with a problem, others requirements may be different.  Personally I want to print at the fastest speed
possible with the best quality and after that, have the coolest features that let me be more productive, that's kinda the Mantra of Sailfish.
However some want to print at 60mm/s and over RepG, and that's okay too.  It's Open Source, fork it and do what you want, often
others may want to do the same as you.   Sailfish was forked from MBI's software based on adding more standalone control, then
speeding up printing, I can see room for forks with other requirements.

5. Both myself and Dan regard this USB comms issue pretty seriously.  It's been bugging us for a while, and Rich
inadvertently finding this issue has given us information as to what's happening and plausible reasons that are different to
what we have been putting it down to before, because we'd assumed that the timeouts were due to increased load,
which was plausible, we'd also assumed that timeouts would be retransmitted correctly.  It looks like there
maybe an issue there that was inherited and just highlighted by Sailfish putting more of a strain on the CPU.
We're fully intending to fix this one on the caveat that the fix doesn't require more resources.  If it does and
we reach the point something has to go to fix it, then it becomes a pro/con decision and as it's possible to
print from SDCard, that may win.

I plan on looking further at the USB comms issue today and it'll likely take the form of hammering the comms from
RepG with dummy commands whilst printing from SDCard and getting a metric on what corruption / loss we're seeing
first, before looking for causes.  

Then I've got the blobbing issue to look at :-)

Jetty

unread,
Jun 1, 2013, 6:15:40 PM6/1/13
to jetty-f...@googlegroups.com
We're added this message to the Build button, it's always been in the documentation anyway to print from SDCard, this just highlights it.


Screen Shot 2013-06-01 at 3.49.41 PM.png

Jetty

unread,
Jun 1, 2013, 9:59:58 PM6/1/13
to jetty-f...@googlegroups.com
It's going up there every time for a few good reasons.

1. It's dangerous to use USB, Rich has illustrated that to us.  Regardless of solving the USB transmission issues (which I've been doing all day,
and believe me there was essentially no error checking in the comms and this was inherited), there is still the chance that an error can be
transmitted which has a valid CRC, but the packet is corrupted, that is a big risk.

2. RepG is changing to fail the print on certain comms errors after attempting 5 retries, because otherwise it's too dangerous to continue.

3. If you're printing via USB, you get worse print quality, period.

4. You will get more prints that fail mid print if you use USB anyway.

5. Some people are ignoring our warnings about this in general, which is fine, but you get a reminder if you do now.

6. If you're printing from SDCard as you should be, you won't get the popup anyway, so it won't be an issue.

I can't stress enough it's still a VERY bad idea to print over USB, I recommend you read the Sailfish docs to find out why this is the case,
 it's been mentioned on the forums a number of times too.  It is tempting to remove it completely, however there are some instances
when programming settings when the risk is less and it's required currently.

As you say you've never had a problem with it, then that tells me you haven't done a lot of printing yet.  The majority of us who have
done any amount of printing with Sailfish or the Stock MBI RepG/Makerware, have found out printing over USB causes issue, and when you're
in the middle of a 4 hour print, that's not good.

However be rest assured that if you continue to ignore our advice, then USB comms will soon be more reliable / robust than they've ever
been.  However if you have things like a poor USB cable, you can expect to have your prints fail due to error more too, as errors will
now be checked and not ignored.

On Saturday, June 1, 2013 6:56:47 PM UTC-6, DamianGto wrote:
I Really hope you do not put that up every time. I always use usb and never had problem with it.
Sure I know the code sucks for it ( right now)

Jetty

unread,
Jun 3, 2013, 3:45:22 PM6/3/13
to jetty-f...@googlegroups.com
Just an update on this one.  We've identified 2 issues that were causing Rich's overheat problem.

1. MBI's firmware and Sailfish (we inherited the code) was not sending communication errors back to the
host PC when printing over USB.   ReplicatorG was also not handling the errors (even if they were sent).

In Rich's case this was causing issues with repeatability, but also will cause general issues with USB comms, timeouts,
overheating, command corruption, commands ignored etc.
If you've ever had weird things happening when printing over USB, this is likely your cause.

2. There is a bug in Sailfish that could occasionally cause heating set point commands to be corrupted,
the chance of this is about 1 in 512 temperature sets, and depends on the positioning of the command in the command
stream.  It would effect both prints from SD Card and USB, but the same print from both mediums would have different
command positioning.  We've had the occasional report from others that heaters were left on after a print etc.   Although this
could also be due to point 1. above, it could also be caused by this bug.  

Both the above issues have been fixed for the next release of Sailfish and RepG, I would class it as a "Required" upgrade.

A few notes about this new USB comms handling.  

a) Because errors are now handled, you will get more errors than before (potentially).
They show up as "Red" in the RepG log.  If you are getting many errors, then try another USB cable or shorter cable, 
you had the problem all the time, it just wasn't showing before

b) If you have errors that can't be recovered from, your print will now likely cancel itself mid print, this isn't a problem
because it can be dangerous to continue on error anyway.

c) Makerware fails to connect over USB to this new firmware with error handling when an error is sent back.
This likely isn't a problem for you as generally Makerware won't connect to newer versions of Sailfish anyway over USB.
You can still print to SD Card with Makerware anyway.

d) We need extensive testing from the community on the USB changes.  Although all the common stuff has / is
being tested, there's the possibility that some things we haven't thought of, may no longer work, please let us know.

Finally, although USB comms is now more reliable than it was before, it's still "Very Highly Recommended" that you
always print from SD Card, because it's more reliable and your print quality will be better as the acceleration buffer
is kept fuller when printing from SD Card and you are less vulnerable to Host PC issues.



whpthomas

unread,
Jun 5, 2013, 12:09:25 PM6/5/13
to jetty-f...@googlegroups.com
d) We need extensive testing from the community on the USB changes.  Although all the common stuff has / is
being tested, there's the possibility that some things we haven't thought of, may no longer work, please let us know.

So I take that what you are really asking us to do is print from USB with the new beta release - just a bit to see if we run into any bugs - even though you strongly advise against it =)

Do any parrots come with that?

Jetty

unread,
Jun 5, 2013, 1:34:36 PM6/5/13
to jetty-f...@googlegroups.com
So I take that what you are really asking us to do is print from USB with the new beta release - just a bit to see if we run into any bugs - even though you strongly advise against it =)

That's correct, gotta love the irony of it ;-)
 
Do any parrots come with that?
The issue we have is that we're short on code space, the voice library would be around 4MB, so not at this stage.
Maybe if you can skip interactive talking parrot, then we might be able to have fixed speech via mp3's especially if
we dropped out a few non-critical things out like Pause @ ZPos, Printing and LED color etc...

Whatever happened to parrot guy anyway?

Craig Bisgeier

unread,
Jun 5, 2013, 3:17:20 PM6/5/13
to jetty-f...@googlegroups.com
Silly Question, but how hard would it be to select a newer Arduino with better capability and port over the code to it?  Not the simplest thing in the world I'm sure but would it even be possible?  Or worthwhile?

Just a thought from a relative noob...

Jetty

unread,
Jun 5, 2013, 3:20:47 PM6/5/13
to jetty-f...@googlegroups.com
Essentially there isn't one that's better and works without changing other hardware or resoldering surface mount.

MacGyver

unread,
Jul 14, 2013, 12:56:36 PM7/14/13
to jetty-f...@googlegroups.com
This USB warning evertime I print with no way to turn it off is starting to get very annoying.  You should at least add an option to turn it off.  I've printed 1000's of models never using the SD card reader just fine.

Andrew

unread,
Jul 15, 2013, 2:25:46 PM7/15/13
to jetty-f...@googlegroups.com
I'm going to +1 this.  I have printed over 300 objects over USB (short cable, OSX) with not one comm issue (that caused any print problem).  Of my last 5 attempts to print via SD, three of them failed by way of a stopped nozzle.  Not extruding, but parked on the print at last height and at full heat.  I have tried three different SD cards, all new 1gb cards formatted in OSX as FAT.  For me, at least, USB appears more reliable!

David Lancaster

unread,
Jul 15, 2013, 9:03:57 PM7/15/13
to jetty-f...@googlegroups.com
++1 from me.  I understand it's limits and avoid using it for long prints, but my Cupcake tends to glitch about 20% of the time when the SD card is removed/inserted (power spike?) and require rebooting, so I tend to do small/simple prints directly over USB.  They're generally simple geometry and print just fine (especially given the limited speeds of the Cupcake motors).

Having a preference to disable the warning, or just showing it once, or disabling it via a config file entry would be nice...

D.


--

Jetty

unread,
Jul 15, 2013, 9:47:58 PM7/15/13
to jetty-f...@googlegroups.com
1. Printing via USB does create errors, most of the time these will be caught and automatically retransmitted, but occasionally 
bad data will get through to your bot, this can result in bad prints, or worse in extreme circumstances, damage to your bot,
and / or to property.

2. Printing via SD Card results in higher quality prints because the acceleration buffer is kept closer to capacity more
of the time, whenever the acceleration buffer drains (which happens frequently on USB prints), you get blobbing as the movement
has to come to a complete stop.

3. Clicking an one extra "OK" button to say that you acknowledge this if you choose to ignore the advice provided, shouldn't be that big a deal.
You already have to click a number of buttons to get a print to output anyway.

That said, the reminder seems to be bugging some users, but equally, it's of sufficient concern that enabling it to
be disabled permanently, creates the possibility of users forgetting about the issue or disabling accidentally or because they
didn't really understand the issue.

One possibility is to have it popup once per session of ReplicatorG, i.e. similar to how MBI handled the "Acceleration Warning"
that pops up the first time you try to print in RepG for a session.

I've created a poll with the 2 options, lets see how many people feel strongly about this before we implement it.
Essentially it comes down to good design versus user experience, and in this instance it's conflicting.  :-)


Thanks
Reply all
Reply to author
Forward
0 new messages