patch tag state problems

3 views
Skip to first unread message

Thomas Hartman

unread,
Jun 14, 2010, 8:50:11 AM6/14/10
to HA...@googlegroups.com
Hi, I'm sharing an experience report from patch tag.

Yesterday the website had some problems, users just saw a blank screen
for a few hours, and eventually this changed to being a 404. The
problem seems to be related to macid state loading at app start. I had
a large number of 0 byte checkpoint and event files (below).

In the server logs, what I was seeing was that patch tag was starting,
seg faulting, and then restarting again. (It is run from inside a bash
while true loop).

When I moved the 0 byte state files to a safe place, the website
became usable again.

Any ideas what could be the underlying problem, or why moving the
checkpoint files away fixed things?

I am going to try more frequent checkpointing, mightybyte's
alternative macid library which supposedly uses a factor ten less
memory (at cost of more cpu iiuc) and maybe upgrade memory.


thartman@patch-tag:~/0bytestate>ls -lth /home/thartman/0bytestate/ |
tac | grep -i 'check' -A2 | head -n25
-rw-r--r-- 1 root root 0 Feb 25 22:44 checkpoints-0000041820
-rw-r--r-- 1 root root 0 Feb 25 22:44 events-0000041821
-rw-r--r-- 1 root root 0 Feb 25 22:45 events-0000041822
--
-rw-r--r-- 1 root root 0 Mar 6 05:12 checkpoints-0000041914
-rw-r--r-- 1 root root 0 Mar 6 05:13 events-0000041915
-rw-r--r-- 1 root root 0 Mar 6 05:18 events-0000041916
--
-rw-r--r-- 1 root root 0 Mar 29 05:48 checkpoints-0000042021
-rw-r--r-- 1 root root 0 Mar 29 06:06 events-0000042024
-rw-r--r-- 1 root root 0 Mar 29 06:07 events-0000042029
--
-rw-r--r-- 1 root root 0 Jun 14 02:02 checkpoints-0000043624
-rw-r--r-- 1 root root 0 Jun 14 02:02 events-0000043625
-rw-r--r-- 1 root root 0 Jun 14 02:02 events-0000043626
--
-rw-r--r-- 1 root root 0 Jun 14 02:04 checkpoints-0000043634
-rw-r--r-- 1 root root 0 Jun 14 02:04 events-0000043635
-rw-r--r-- 1 root root 0 Jun 14 02:04 events-0000043636
-rw-r--r-- 1 root root 0 Jun 14 02:04 checkpoints-0000043636
-rw-r--r-- 1 root root 0 Jun 14 02:04 events-0000043637
-rw-r--r-- 1 root root 0 Jun 14 02:04 events-0000043638
--
-rw-r--r-- 1 root root 0 Jun 14 02:06 checkpoints-0000043655
-rw-r--r-- 1 root root 0 Jun 14 02:15 events-0000043656

--
Need somewhere to put your code? http://patch-tag.com
Want to build a webapp? http://happstack.com

Jeremy Shaw

unread,
Jun 14, 2010, 9:48:54 AM6/14/10
to ha...@googlegroups.com
On Mon, Jun 14, 2010 at 7:50 AM, Thomas Hartman
<thomash...@googlemail.com> wrote:
> Hi, I'm sharing an experience report from patch tag.

Sweet! More people should do that! Otherwise we want know what to fix!

> Yesterday the website had some problems, users just saw a blank screen
> for a few hours, and eventually this changed to being a 404. The
> problem seems to be related to macid state loading at app start. I had
> a large number of 0 byte checkpoint and event files (below).
>

> Any ideas what could be the underlying problem, or why moving the
> checkpoint files away fixed things?

I don't think happstack-state does the ideal thing we writing
checkpoint files at the moment. It should probably write the
checkpoint file to a temporary file, and the rename it to be a
checkpoint file once the write has completed successfully. I am
guessing that something funky happened that caused a checkpoint file
to go bad, and then it crashed when trying to read the 0-byte
checkpoint file from then on. So, going back to the last good state
fixed things? We should be able to test what happens if you have a
0-byte checkpoint file pretty easily..

I have added this to the 0.7 release on the roadmap
http://code.google.com/p/happstack/wiki/RoadMap?ts=1276523252&updated=RoadMap.
Though I would happily take a patch before then ;)

> I am going to try more frequent checkpointing, mightybyte's
> alternative macid library which supposedly uses a factor ten less
> memory (at cost of more cpu iiuc) and maybe upgrade memory.

How does that work?

- jeremy

Jeremy Shaw

unread,
Jun 14, 2010, 10:05:55 AM6/14/10
to ha...@googlegroups.com
On Mon, Jun 14, 2010 at 8:48 AM, Jeremy Shaw <jer...@n-heptane.com> wrote:
>> I am going to try more frequent checkpointing, mightybyte's
>> alternative macid library which supposedly uses a factor ten less
>> memory (at cost of more cpu iiuc) and maybe upgrade memory.
>
> How does that work?

Oh, maybe you mean his compact-ixset stuff ? Perhaps should should
make that more widely known/available ?

- jeremy

MightyByte

unread,
Jun 14, 2010, 10:17:19 AM6/14/10
to ha...@googlegroups.com
On Mon, Jun 14, 2010 at 10:05 AM, Jeremy Shaw <jer...@n-heptane.com> wrote:
>
> Oh, maybe you mean his compact-ixset stuff ? Perhaps should should
> make that more widely known/available ?

Yes, I think that's what he means. I've mentioned it a couple times.
I only got the code compiling--never got around to testing it. That
work is also old now. About a month ago I upgraded my app to the
latest version of happstack and was quite pleased to discover that my
memory usage dropped by 50%! It might be better to rewrite
compact-ixset based on the most recent ixset that has this improved
behavior.

Gracjan Polak

unread,
Jun 14, 2010, 11:16:53 AM6/14/10
to HAppS


I have quite a lot of pending ixset changes to share! Those are mostly
CPU oriented, but it is quite possible that those also reduce RAM
usage. Stay tuned!

Gracjan Polak

unread,
Jun 14, 2010, 5:58:33 PM6/14/10
to HAppS

From my code investigation:

loadCheckpoints saver
= do checkpointss <- readerGetUncut saver
case checkpointss of
[checkpoints] -> return checkpoints
[] -> return M.empty
_ -> error "Failed to read checkpoints."

so whatever here happens it is safe. Lets see readerGetUncut:

readerGetUncut = do logMF NOTICE "fileReader: readerGetUncut"
allData <- B.readFile file
`catchE` \_ -> return B.empty
return $ parseAll
(L.fromChunks [allData])

if anything happens in readFile it is caught. Lets see if allData is
ok to be empty:

parseAll = loop
where loop l | L.null l = []
loop l = let (a,rest) = deserialize l
in a:loop rest

Yes, there is special guard here to protect against empty files.

Thomas, do you have more info you could give about mysterious
segfaults?

--
Gracjan
Reply all
Reply to author
Forward
0 new messages