greylock segfault at indexing

12 views
Skip to first unread message

Миша Минаев

unread,
Nov 12, 2015, 9:02:17 AM11/12/15
to reverbrain
this is request {"mailbox":"test","docs":[{"id":"gitlab.srv.pv.km/mailpaas/Compose/build/gopath/src/golang.org/x/net/html/token_test.go","bucket":"b1","key":"key","timestamp":{"tsec":1440696489,"tnsec":1234},"index":{"text":"// Copyright 2010 The Go Authors. All rights reserved.\n// Use of this source code is governed by a BSD-style\n// license that can be found in the LICENSE file.\n\npackage html\n\nimport (\n\t\"bytes\"\n\t\"io\"\n\t\"io/ioutil\"\n\t\"reflect\"\n\t\"runtime\"\n\t\"strings\"\n\t\"testing\"\n)\n\ntype tokenTest struct {\n\t// A short description of the test case.\n\tdesc string\n\t// The HTML to parse.\n\thtml string\n\t// The string representations of the expected tokens, joined by '$'.\n\tgolden string\n}\n\nvar tokenTests = []tokenTest{\n\t{\n\t\t\"empty\",\n\t\t\"\",\n\t\t\"\",\n\t},\n\t// A single text node. The tokenizer should not break text nodes on whitespace,\n\t// nor should it normalize whitespace within a text node.\n\t{\n\t\t\"text\",\n\t\t\"foo  bar\",\n\t\t\"foo  bar\",\n\t},\n\t// An entity.\n\t{\n\t\t\"entity\",\n\t\t\"one \u0026lt; two\",\n\t\t\"one \u0026lt; two\",\n\t},\n\t// A start, self-closing and end tag. The tokenizer does not care if the start\n\t// and end tokens don't match; that is the job of the parser.\n\t{\n\t\t\"tags\",\n\t\t\"\u003ca\u003eb\u003cc/\u003ed\u003c/e\u003e\",\n\t\t\"\u003ca\u003e$b$\u003cc/\u003e$d$\u003c/e\u003e\",\n\t},\n\t// Angle brackets that aren't a tag.\n\t{\n\t\t\"not a tag #0\",\n\t\t\"\u003c\",\n\t\t\"\u0026lt;\",\n\t},\n\t{\n\t\t\"not a tag #1\",\n\t\t\"\u003c/\",\n\t\t\"\u0026lt;/\",\n\t},\n\t{\n\t\t\"not a tag #2\",\n\t\t\"\u003c/\u003e\",\n\t\t\"\u003c!----\u003e\",\n\t},\n\t{\n\t\t\"not a tag #3\",\n\t\t\"a\u003c/\u003eb\",\n\t\t\"a$\u003c!----\u003e$b\",\n\t},\n\t{\n\t\t\"not a tag #4\",\n\t\t\"\u003c/ \u003e\",\n\t\t\"\u003c!-- --\u003e\",\n\t},\n\t{\n\t\t\"not a tag #5\",\n\t\t\"\u003c/.\",\n\t\t\"\u003c!--.--\u003e\",\n\t},\n\t{\n\t\t\"not a tag #6\",\n\t\t\"\u003c/.\u003e\",\n\t\t\"\u003c!--.--\u003e\",\n\t},\n\t{\n\t\t\"not a tag #7\",\n\t\t\"a \u003c b\",\n\t\t\"a \u0026lt; b\",\n\t},\n\t{\n\t\t\"not a tag #8\",\n\t\t\"\u003c.\u003e\",\n\t\t\"\u0026lt;.\u0026gt;\",\n\t},\n\t{\n\t\t\"not a tag #9\",\n\t\t\"a\u003c\u003c\u003cb\u003e\u003e\u003ec\",\n\t\t\"a\u0026lt;\u0026lt;$\u003cb\u003e$\u0026gt;\u0026gt;c\",\n\t},\n\t{\n\t\t\"not a tag #10\",\n\t\t\"if x\u003c0 and y \u003c 0 then x*y\u003e0\",\n\t\t\"if x\u0026lt;0 and y \u0026lt; 0 then x*y\u0026gt;0\",\n\t},\n\t{\n\t\t\"not a tag #11\",\n\t\t\"\u003c\u003cp\u003e\",\n\t\t\"\u0026lt;$\u003cp\u003e\",\n\t},\n\t// EOF in a tag name.\n\t{\n\t\t\"tag name eof #0\",\n\t\t\"\u003ca\",\n\t\t\"\",\n\t},\n\t{\n\t\t\"tag name eof #1\",\n\t\t\"\u003ca \",\n\t\t\"\",\n\t},\n\t{\n\t\t\"tag name eof #2\",\n\t\t\"a\u003cb\",\n\t\t\"a\",\n\t},\n\t{\n\t\t\"tag name eof #3\",\n\t\t\"\u003ca\u003e\u003cb\",\n\t\t\"\u003ca\u003e\",\n\t},\n\t{\n\t\t\"tag name eof #4\",\n\t\t`\u003ca x`,\n\t\t``,\n\t},\n\t// Some malformed tags that are missing a '\u003e'.\n\t{\n\t\t\"malformed tag #0\",\n\t\t`\u003cp\u003c/p\u003e`,\n\t\t`\u003cp\u003c p=\"\"\u003e`,\n\t},\n\t{\n\t\t\"malformed tag #1\",\n\t\t`\u003cp \u003c/p\u003e`,\n\t\t`\u003cp \u003c=\"\" p=\"\"\u003e`,\n\t},\n\t{\n\t\t\"malformed tag #2\",\n\t\t`\u003cp id`,\n\t\t``,\n\t},\n\t{\n\t\t\"malformed tag #3\",\n\t\t`\u003cp id=`,\n\t\t``,\n\t},\n\t{\n\t\t\"malformed tag #4\",\n\t\t`\u003cp id=\u003e`,\n\t\t`\u003cp id=\"\"\u003e`,\n\t},\n\t{\n\t\t\"malformed tag #5\",\n\t\t`\u003cp id=0`,\n\t\t``,\n\t},\n\t{\n\t\t\"malformed tag #6\",\n\t\t`\u003cp id=0\u003c/p\u003e`,\n\t\t`\u003cp id=\"0\u0026lt;/p\"\u003e`,\n\t},\n\t{\n\t\t\"malformed tag #7\",\n\t\t`\u003cp id=\"0\u003c/p\u003e`,\n\t\t``,\n\t},\n\t{\n\t\t\"malformed tag #8\",\n\t\t`\u003cp id=\"0\"\u003c/p\u003e`,\n\t\t`\u003cp id=\"0\" \u003c=\"\" p=\"\"\u003e`,\n\t},\n\t{\n\t\t\"malformed tag #9\",\n\t\t`\u003cp\u003e\u003c/p id`,\n\t\t`\u003cp\u003e`,\n\t},\n\t// Raw text and RCDATA.\n\t{\n\t\t\"basic raw text\",\n\t\t\"\u003cscript\u003e\u003ca\u003e\u003c/b\u003e\u003c/script\u003e\",\n\t\t\"\u003cscript\u003e$\u0026lt;a\u0026gt;\u0026lt;/b\u0026gt;$\u003c/script\u003e\",\n\t},\n\t{\n\t\t\"unfinished script end tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCR\",\n\t\t\"\u003cscript\u003e$a\u0026lt;/SCR\",\n\t},\n\t{\n\t\t\"broken script end tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCR ipt\u003e\",\n\t\t\"\u003cscript\u003e$a\u0026lt;/SCR ipt\u0026gt;\",\n\t},\n\t{\n\t\t\"EOF in script end tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCRipt\",\n\t\t\"\u003cscript\u003e$a\u0026lt;/SCRipt\",\n\t},\n\t{\n\t\t\"scriptx end tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCRiptx\",\n\t\t\"\u003cscript\u003e$a\u0026lt;/SCRiptx\",\n\t},\n\t{\n\t\t\"' ' completes script end tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCRipt \",\n\t\t\"\u003cscript\u003e$a\",\n\t},\n\t{\n\t\t\"'\u003e' completes script end tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCRipt\u003e\",\n\t\t\"\u003cscript\u003e$a$\u003c/script\u003e\",\n\t},\n\t{\n\t\t\"self-closing script end tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCRipt/\u003e\",\n\t\t\"\u003cscript\u003e$a$\u003c/script\u003e\",\n\t},\n\t{\n\t\t\"nested script tag\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCRipt\u003cscript\u003e\",\n\t\t\"\u003cscript\u003e$a\u0026lt;/SCRipt\u0026lt;script\u0026gt;\",\n\t},\n\t{\n\t\t\"script end tag after unfinished\",\n\t\t\"\u003cSCRIPT\u003ea\u003c/SCRipt\u003c/script\u003e\",\n\t\t\"\u003cscript\u003e$a\u0026lt;/SCRipt$\u003c/script\u003e\",\n\t},\n\t{\n\t\t\"script/style mismatched tags\",\n\t\t\"\u003cscript\u003ea\u003c/style\u003e\",\n\t\t\"\u003cscript\u003e$a\u0026lt;/style\u0026gt;\",\n\t},\n\t{\n\t\t\"style element with entity\",\n\t\t\"\u003cstyle\u003e\u0026apos;\",\n\t\t\"\u003cstyle\u003e$\u0026amp;apos;\",\n\t},\n\t{\n\t\t\"textarea with tag\",\n\t\t\"\u003ctextarea\u003e\u003cdiv\u003e\u003c/textarea\u003e\",\n\t\t\"\u003ctextarea\u003e$\u0026lt;div\u0026gt;$\u003c/textarea\u003e\",\n\t},\n\t{\n\t\t\"title with tag and entity\",\n\t\t\"\u003ctitle\u003e\u003cb\u003eK\u0026amp;R C\u003c/b\u003e\u003c/title\u003e\",\n\t\t\"\u003ctitle\u003e$\u0026lt;b\u0026gt;K\u0026amp;R C\u0026lt;/b\u0026gt;$\u003c/title\u003e\",\n\t},\n\t// DOCTYPE tests.\n\t{\n\t\t\"Proper DOCTYPE\",\n\t\t\"\u003c!DOCTYPE html\u003e\",\n\t\t\"\u003c!DOCTYPE html\u003e\",\n\t},\n\t{\n\t\t\"DOCTYPE with no space\",\n\t\t\"\u003c!doctypehtml\u003e\",\n\t\t\"\u003c!DOCTYPE html\u003e\",\n\t},\n\t{\n\t\t\"DOCTYPE with two spaces\",\n\t\t\"\u003c!doctype  html\u003e\",\n\t\t\"\u003c!DOCTYPE html\u003e\",\n\t},\n\t{\n\t\t\"looks like DOCTYPE but isn't\",\n\t\t\"\u003c!DOCUMENT html\u003e\",\n\t\t\"\u003c!--DOCUMENT html--\u003e\",\n\t},\n\t{\n\t\t\"DOCTYPE at EOF\",\n\t\t\"\u003c!DOCtype\",\n\t\t\"\u003c!DOCTYPE \u003e\",\n\t},\n\t// XML processing instructions.\n\t{\n\t\t\"XML processing instruction\",\n\t\t\"\u003c?xml?\u003e\",\n\t\t\"\u003c!--?xml?--\u003e\",\n\t},\n\t// Comments.\n\t{\n\t\t\"comment0\",\n\t\t\"abc\u003cb\u003e\u003c!-- skipme --\u003e\u003c/b\u003edef\",\n\t\t\"abc$\u003cb\u003e$\u003c!-- skipme --\u003e$\u003c/b\u003e$def\",\n\t},\n\t{\n\t\t\"comment1\",\n\t\t\"a\u003c!--\u003ez\",\n\t\t\"a$\u003c!----\u003e$z\",\n\t},\n\t{\n\t\t\"comment2\",\n\t\t\"a\u003c!---\u003ez\",\n\t\t\"a$\u003c!----\u003e$z\",\n\t},\n\t{\n\t\t\"comment3\",\n\t\t\"a\u003c!--x\u003e--\u003ez\",\n\t\t\"a$\u003c!--x\u003e--\u003e$z\",\n\t},\n\t{\n\t\t\"comment4\",\n\t\t\"a\u003c!--x-\u003e--\u003ez\",\n\t\t\"a$\u003c!--x-\u003e--\u003e$z\",\n\t},\n\t{\n\t\t\"comment5\",\n\t\t\"a\u003c!\u003ez\",\n\t\t\"a$\u003c!----\u003e$z\",\n\t},\n\t{\n\t\t\"comment6\",\n\t\t\"a\u003c!-\u003ez\",\n\t\t\"a$\u003c!-----\u003e$z\",\n\t},\n\t{\n\t\t\"comment7\",\n\t\t\"a\u003c!---\u003c\u003ez\",\n\t\t\"a$\u003c!---\u003c\u003ez--\u003e\",\n\t},\n\t{\n\t\t\"comment8\",\n\t\t\"a\u003c!--z\",\n\t\t\"a$\u003c!--z--\u003e\",\n\t},\n\t{\n\t\t\"comment9\",\n\t\t\"a\u003c!--z-\",\n\t\t\"a$\u003c!--z--\u003e\",\n\t},\n\t{\n\t\t\"comment10\",\n\t\t\"a\u003c!--z--\",\n\t\t\"a$\u003c!--z--\u003e\",\n\t},\n\t{\n\t\t\"comment11\",\n\t\t\"a\u003c!--z---\",\n\t\t\"a$\u003c!--z---\u003e\",\n\t},\n\t{\n\t\t\"comment12\",\n\t\t\"a\u003c!--z----\",\n\t\t\"a$\u003c!--z----\u003e\",\n\t},\n\t{\n\t\t\"comment13\",\n\t\t\"a\u003c!--x--!\u003ez\",\n\t\t\"a$\u003c!--x--\u003e$z\",\n\t},\n\t// An attribute with a backslash.\n\t{\n\t\t\"backslash\",\n\t\t`\u003cp id=\"a\\\"b\"\u003e`,\n\t\t`\u003cp id=\"a\\\" b\"=\"\"\u003e`,\n\t},\n\t// Entities, tag name and attribute key lower-casing, and whitespace\n\t// normalization within a tag.\n\t{\n\t\t\"tricky\",\n\t\t\"\u003cp \\t\\n iD=\\\"a\u0026quot;B\\\"  foo=\\\"bar\\\"\u003e\u003cEM\u003ete\u0026lt;\u0026amp;;xt\u003c/em\u003e\u003c/p\u003e\",\n\t\t`\u003cp id=\"a\u0026#34;B\" foo=\"bar\"\u003e$\u003cem\u003e$te\u0026lt;\u0026amp;;xt$\u003c/em\u003e$\u003c/p\u003e`,\n\t},\n\t// A nonexistent entity. Tokenizing and converting back to a string should\n\t// escape the \"\u0026\" to become \"\u0026amp;\".\n\t{\n\t\t\"noSuchEntity\",\n\t\t`\u003ca b=\"c\u0026noSuchEntity;d\"\u003e\u0026lt;\u0026alsoDoesntExist;\u0026`,\n\t\t`\u003ca b=\"c\u0026amp;noSuchEntity;d\"\u003e$\u0026lt;\u0026amp;alsoDoesntExist;\u0026amp;`,\n\t},\n\t{\n\t\t\"entity without semicolon\",\n\t\t`\u0026notit;\u0026notin;\u003ca b=\"q=z\u0026amp=5\u0026notice=hello\u0026not;=world\"\u003e`,\n\t\t`¬it;∉$\u003ca b=\"q=z\u0026amp;amp=5\u0026amp;notice=hello¬=world\"\u003e`,\n\t},\n\t{\n\t\t\"entity with digits\",\n\t\t\"\u0026frac12;\",\n\t\t\"½\",\n\t},\n\t// Attribute tests:\n\t// http://dev.w3.org/html5/pf-summary/Overview.html#attributes\n\t{\n\t\t\"Empty attribute\",\n\t\t`\u003cinput disabled FOO\u003e`,\n\t\t`\u003cinput disabled=\"\" foo=\"\"\u003e`,\n\t},\n\t{\n\t\t\"Empty attribute, whitespace\",\n\t\t`\u003cinput disabled FOO \u003e`,\n\t\t`\u003cinput disabled=\"\" foo=\"\"\u003e`,\n\t},\n\t{\n\t\t\"Unquoted attribute value\",\n\t\t`\u003cinput value=yes FOO=BAR\u003e`,\n\t\t`\u003cinput value=\"yes\" foo=\"BAR\"\u003e`,\n\t},\n\t{\n\t\t\"Unquoted attribute value, spaces\",\n\t\t`\u003cinput value = yes FOO = BAR\u003e`,\n\t\t`\u003cinput value=\"yes\" foo=\"BAR\"\u003e`,\n\t},\n\t{\n\t\t\"Unquoted attribute value, trailing space\",\n\t\t`\u003cinput value=yes FOO=BAR \u003e`,\n\t\t`\u003cinput value=\"yes\" foo=\"BAR\"\u003e`,\n\t},\n\t{\n\t\t\"Single-quoted attribute value\",\n\t\t`\u003cinput value='yes' FOO='BAR'\u003e`,\n\t\t`\u003cinput value=\"yes\" foo=\"BAR\"\u003e`,\n\t},\n\t{\n\t\t\"Single-quoted attribute value, trailing space\",\n\t\t`\u003cinput value='yes' FOO='BAR' \u003e`,\n\t\t`\u003cinput value=\"yes\" foo=\"BAR\"\u003e`,\n\t},\n\t{\n\t\t\"Double-quoted attribute value\",\n\t\t`\u003cinput value=\"I'm an attribute\" FOO=\"BAR\"\u003e`,\n\t\t`\u003cinput value=\"I\u0026#39;m an attribute\" foo=\"BAR\"\u003e`,\n\t},\n\t{\n\t\t\"Attribute name characters\",\n\t\t`\u003cmeta http-equiv=\"content-type\"\u003e`,\n\t\t`\u003cmeta http-equiv=\"content-type\"\u003e`,\n\t},\n\t{\n\t\t\"Mixed attributes\",\n\t\t`a\u003cP V=\"0 1\" w='2' X=3 y\u003ez`,\n\t\t`a$\u003cp v=\"0 1\" w=\"2\" x=\"3\" y=\"\"\u003e$z`,\n\t},\n\t{\n\t\t\"Attributes with a solitary single quote\",\n\t\t`\u003cp id=can't\u003e\u003cp id=won't\u003e`,\n\t\t`\u003cp id=\"can\u0026#39;t\"\u003e$\u003cp id=\"won\u0026#39;t\"\u003e`,\n\t},\n}\n\nfunc TestTokenizer(t *testing.T) {\nloop:\n\tfor _, tt := range tokenTests {\n\t\tz := NewTokenizer(strings.NewReader(tt.html))\n\t\tif tt.golden != \"\" {\n\t\t\tfor i, s := range strings.Split(tt.golden, \"$\") {\n\t\t\t\tif z.Next() == ErrorToken {\n\t\t\t\t\tt.Errorf(\"%s token %d: want %q got error %v\", tt.desc, i, s, z.Err())\n\t\t\t\t\tcontinue loop\n\t\t\t\t}\n\t\t\t\tactual := z.Token().String()\n\t\t\t\tif s != actual {\n\t\t\t\t\tt.Errorf(\"%s token %d: want %q got %q\", tt.desc, i, s, actual)\n\t\t\t\t\tcontinue loop\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t\tz.Next()\n\t\tif z.Err() != io.EOF {\n\t\t\tt.Errorf(\"%s: want EOF got %q\", tt.desc, z.Err())\n\t\t}\n\t}\n}\n\nfunc TestMaxBuffer(t *testing.T) {\n\t// Exceeding the maximum buffer size generates ErrBufferExceeded.\n\tz := NewTokenizer(strings.NewReader(\"\u003c\" + strings.Repeat(\"t\", 10)))\n\tz.SetMaxBuf(5)\n\ttt := z.Next()\n\tif got, want := tt, ErrorToken; got != want {\n\t\tt.Fatalf(\"token type: got: %v want: %v\", got, want)\n\t}\n\tif got, want := z.Err(), ErrBufferExceeded; got != want {\n\t\tt.Errorf(\"error type: got: %v want: %v\", got, want)\n\t}\n\tif got, want := string(z.Raw()), \"\u003ctttt\"; got != want {\n\t\tt.Fatalf(\"buffered before overflow: got: %q want: %q\", got, want)\n\t}\n}\n\nfunc TestMaxBufferReconstruction(t *testing.T) {\n\t// Exceeding the maximum buffer size at any point while tokenizing permits\n\t// reconstructing the original input.\ntests:\n\tfor _, test := range tokenTests {\n\t\tfor maxBuf := 1; ; maxBuf++ {\n\t\t\tr := strings.NewReader(test.html)\n\t\t\tz := NewTokenizer(r)\n\t\t\tz.SetMaxBuf(maxBuf)\n\t\t\tvar tokenized bytes.Buffer\n\t\t\tfor {\n\t\t\t\ttt := z.Next()\n\t\t\t\ttokenized.Write(z.Raw())\n\t\t\t\tif tt == ErrorToken {\n\t\t\t\t\tif err := z.Err(); err != io.EOF \u0026\u0026 err != ErrBufferExceeded {\n\t\t\t\t\t\tt.Errorf(\"%s: unexpected error: %v\", test.desc, err)\n\t\t\t\t\t}\n\t\t\t\t\tbreak\n\t\t\t\t}\n\t\t\t}\n\t\t\t// Anything tokenized along with untokenized input or data left in the reader.\n\t\t\tassembled, err := ioutil.ReadAll(io.MultiReader(\u0026tokenized, bytes.NewReader(z.Buffered()), r))\n\t\t\tif err != nil {\n\t\t\t\tt.Errorf(\"%s: ReadAll: %v\", test.desc, err)\n\t\t\t\tcontinue tests\n\t\t\t}\n\t\t\tif got, want := string(assembled), test.html; got != want {\n\t\t\t\tt.Errorf(\"%s: reassembled html:\\n got: %q\\nwant: %q\", test.desc, got, want)\n\t\t\t\tcontinue tests\n\t\t\t}\n\t\t\t// EOF indicates that we completed tokenization and hence found the max\n\t\t\t// maxBuf that generates ErrBufferExceeded, so continue to the next test.\n\t\t\tif z.Err() == io.EOF {\n\t\t\t\tbreak\n\t\t\t}\n\t\t} // buffer sizes\n\t} // tests\n}\n\nfunc TestPassthrough(t *testing.T) {\n\t// Accumulating the raw output for each parse event should reconstruct the\n\t// original input.\n\tfor _, test := range tokenTests {\n\t\tz := NewTokenizer(strings.NewReader(test.html))\n\t\tvar parsed bytes.Buffer\n\t\tfor {\n\t\t\ttt := z.Next()\n\t\t\tparsed.Write(z.Raw())\n\t\t\tif tt == ErrorToken {\n\t\t\t\tbreak\n\t\t\t}\n\t\t}\n\t\tif got, want := parsed.String(), test.html; got != want {\n\t\t\tt.Errorf(\"%s: parsed output:\\n got: %q\\nwant: %q\", test.desc, got, want)\n\t\t}\n\t}\n}\n\nfunc TestBufAPI(t *testing.T) {\n\ts := \"0\u003ca\u003e1\u003c/a\u003e2\u003cb\u003e3\u003ca\u003e4\u003ca\u003e5\u003c/a\u003e6\u003c/b\u003e7\u003c/a\u003e8\u003ca/\u003e9\"\n\tz := NewTokenizer(bytes.NewBufferString(s))\n\tvar result bytes.Buffer\n\tdepth := 0\nloop:\n\tfor {\n\t\ttt := z.Next()\n\t\tswitch tt {\n\t\tcase ErrorToken:\n\t\t\tif z.Err() != io.EOF {\n\t\t\t\tt.Error(z.Err())\n\t\t\t}\n\t\t\tbreak loop\n\t\tcase TextToken:\n\t\t\tif depth \u003e 0 {\n\t\t\t\tresult.Write(z.Text())\n\t\t\t}\n\t\tcase StartTagToken, EndTagToken:\n\t\t\ttn, _ := z.TagName()\n\t\t\tif len(tn) == 1 \u0026\u0026 tn[0] == 'a' {\n\t\t\t\tif tt == StartTagToken {\n\t\t\t\t\tdepth++\n\t\t\t\t} else {\n\t\t\t\t\tdepth--\n\t\t\t\t}\n\t\t\t}\n\t\t}\n\t}\n\tu := \"14567\"\n\tv := string(result.Bytes())\n\tif u != v {\n\t\tt.Errorf(\"TestBufAPI: want %q got %q\", u, v)\n\t}\n}\n\nfunc TestConvertNewlines(t *testing.T) {\n\ttestCases := map[string]string{\n\t\t\"Mac\\rDOS\\r\\nUnix\\n\":    \"Mac\\nDOS\\nUnix\\n\",\n\t\t\"Unix\\nMac\\rDOS\\r\\n\":    \"Unix\\nMac\\nDOS\\n\",\n\t\t\"DOS\\r\\nDOS\\r\\nDOS\\r\\n\": \"DOS\\nDOS\\nDOS\\n\",\n\t\t\"\":         \"\",\n\t\t\"\\n\":       \"\\n\",\n\t\t\"\\n\\r\":     \"\\n\\n\",\n\t\t\"\\r\":       \"\\n\",\n\t\t\"\\r\\n\":     \"\\n\",\n\t\t\"\\r\\n\\n\":   \"\\n\\n\",\n\t\t\"\\r\\n\\r\":   \"\\n\\n\",\n\t\t\"\\r\\n\\r\\n\": \"\\n\\n\",\n\t\t\"\\r\\r\":     \"\\n\\n\",\n\t\t\"\\r\\r\\n\":   \"\\n\\n\",\n\t\t\"\\r\\r\\n\\n\": \"\\n\\n\\n\",\n\t\t\"\\r\\r\\r\\n\": \"\\n\\n\\n\",\n\t\t\"\\r \\n\":    \"\\n \\n\",\n\t\t\"xyz\":      \"xyz\",\n\t}\n\tfor in, want := range testCases {\n\t\tif got := string(convertNewlines([]byte(in))); got != want {\n\t\t\tt.Errorf(\"input %q: got %q, want %q\", in, got, want)\n\t\t}\n\t}\n}\n\nfunc TestReaderEdgeCases(t *testing.T) {\n\tconst s = \"\u003cp\u003eAn io.Reader can return (0, nil) or (n, io.EOF).\u003c/p\u003e\"\n\ttestCases := []io.Reader{\n\t\t\u0026zeroOneByteReader{s: s},\n\t\t\u0026eofStringsReader{s: s},\n\t\t\u0026stuckReader{},\n\t}\n\tfor i, tc := range testCases {\n\t\tgot := []TokenType{}\n\t\tz := NewTokenizer(tc)\n\t\tfor {\n\t\t\ttt := z.Next()\n\t\t\tif tt == ErrorToken {\n\t\t\t\tbreak\n\t\t\t}\n\t\t\tgot = append(got, tt)\n\t\t}\n\t\tif err := z.Err(); err != nil \u0026\u0026 err != io.EOF {\n\t\t\tif err != io.ErrNoProgress {\n\t\t\t\tt.Errorf(\"i=%d: %v\", i, err)\n\t\t\t}\n\t\t\tcontinue\n\t\t}\n\t\twant := []TokenType{\n\t\t\tStartTagToken,\n\t\t\tTextToken,\n\t\t\tEndTagToken,\n\t\t}\n\t\tif !reflect.DeepEqual(got, want) {\n\t\t\tt.Errorf(\"i=%d: got %v, want %v\", i, got, want)\n\t\t\tcontinue\n\t\t}\n\t}\n}\n\n// zeroOneByteReader is like a strings.Reader that alternates between\n// returning 0 bytes and 1 byte at a time.\ntype zeroOneByteReader struct {\n\ts string\n\tn int\n}\n\nfunc (r *zeroOneByteReader) Read(p []byte) (int, error) {\n\tif len(p) == 0 {\n\t\treturn 0, nil\n\t}\n\tif len(r.s) == 0 {\n\t\treturn 0, io.EOF\n\t}\n\tr.n++\n\tif r.n%2 != 0 {\n\t\treturn 0, nil\n\t}\n\tp[0], r.s = r.s[0], r.s[1:]\n\treturn 1, nil\n}\n\n// eofStringsReader is like a strings.Reader but can return an (n, err) where\n// n \u003e 0 \u0026\u0026 err != nil.\ntype eofStringsReader struct {\n\ts string\n}\n\nfunc (r *eofStringsReader) Read(p []byte) (int, error) {\n\tn := copy(p, r.s)\n\tr.s = r.s[n:]\n\tif r.s != \"\" {\n\t\treturn n, nil\n\t}\n\treturn n, io.EOF\n}\n\n// stuckReader is an io.Reader that always returns no data and no error.\ntype stuckReader struct{}\n\nfunc (*stuckReader) Read(p []byte) (int, error) {\n\treturn 0, nil\n}\n\nconst (\n\trawLevel = iota\n\tlowLevel\n\thighLevel\n)\n\nfunc benchmarkTokenizer(b *testing.B, level int) {\n\tbuf, err := ioutil.ReadFile(\"testdata/go1.html\")\n\tif err != nil {\n\t\tb.Fatalf(\"could not read testdata/go1.html: %v\", err)\n\t}\n\tb.SetBytes(int64(len(buf)))\n\truntime.GC()\n\tb.ReportAllocs()\n\tb.ResetTimer()\n\tfor i := 0; i \u003c b.N; i++ {\n\t\tz := NewTokenizer(bytes.NewBuffer(buf))\n\t\tfor {\n\t\t\ttt := z.Next()\n\t\t\tif tt == ErrorToken {\n\t\t\t\tif err := z.Err(); err != nil \u0026\u0026 err != io.EOF {\n\t\t\t\t\tb.Fatalf(\"tokenizer error: %v\", err)\n\t\t\t\t}\n\t\t\t\tbreak\n\t\t\t}\n\t\t\tswitch level {\n\t\t\tcase rawLevel:\n\t\t\t\t// Calling z.Raw just returns the raw bytes of the token. It does\n\t\t\t\t// not unescape \u0026lt; to \u003c, or lower-case tag names and attribute keys.\n\t\t\t\tz.Raw()\n\t\t\tcase lowLevel:\n\t\t\t\t// Caling z.Text, z.TagName and z.TagAttr returns []byte values\n\t\t\t\t// whose contents may change on the next call to z.Next.\n\t\t\t\tswitch tt {\n\t\t\t\tcase TextToken, CommentToken, DoctypeToken:\n\t\t\t\t\tz.Text()\n\t\t\t\tcase StartTagToken, SelfClosingTagToken:\n\t\t\t\t\t_, more := z.TagName()\n\t\t\t\t\tfor more {\n\t\t\t\t\t\t_, _, more = z.TagAttr()\n\t\t\t\t\t}\n\t\t\t\tcase EndTagToken:\n\t\t\t\t\tz.TagName()\n\t\t\t\t}\n\t\t\tcase highLevel:\n\t\t\t\t// Calling z.Token converts []byte values to strings whose validity\n\t\t\t\t// extend beyond the next call to z.Next.\n\t\t\t\tz.Token()\n\t\t\t}\n\t\t}\n\t}\n}\n\nfunc BenchmarkRawLevelTokenizer(b *testing.B)  { benchmarkTokenizer(b, rawLevel) }\nfunc BenchmarkLowLevelTokenizer(b *testing.B)  { benchmarkTokenizer(b, lowLevel) }\nfunc BenchmarkHighLevelTokenizer(b *testing.B) { benchmarkTokenizer(b, highLevel) }\n"}},{"bucket":"b1"}]}

backtrace:

#0  0x00007fb80fef9e37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007fb80fefb528 in __GI_abort () at abort.c:89
#2  0x00007fb81080505d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#3  0x00007fb810802ed6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#4  0x00007fb810802f21 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#5  0x00007fb810803139 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#6  0x000000000055de31 in msgpack::operator>> (o=..., meta=...) at /greylock/include/greylock/index.hpp:755
#7  0x00000000005e4aad in msgpack::object::convert<ioremap::greylock::index_meta> (this=0x7fb809117e10, v=0x7fb809117e50) at /usr/include/msgpack/object.hpp:266
#8  0x00000000005d0b6e in msgpack::object::as<ioremap::greylock::index_meta> (this=0x7fb809117e10) at /usr/include/msgpack/object.hpp:273
#9  0x00000000005b550c in ioremap::greylock::index<ioremap::greylock::bucket_transport>::index (this=0x7fb809118340, t=..., sk=..., read_only=false) at /greylock/include/greylock/index.hpp:123
#10 0x0000000000598a9c in ioremap::greylock::read_write_index<ioremap::greylock::bucket_transport>::read_write_index (this=0x7fb809118340, t=..., start=...) at /greylock/include/greylock/index.hpp:704
#11 0x000000000057e548 in http_server::on_index::process_one_document (this=0x7fb7e40011e0, req=..., mbox="test", doc=..., idxs=...) at /greylock/src/server.cpp:640
#12 0x000000000057f512 in http_server::on_index::parse_docs (this=0x7fb7e40011e0, req=..., mbox="test", docs=...) at /greylock/src/server.cpp:707
#13 0x000000000057ffaa in http_server::on_index::on_request (this=0x7fb7e40011e0, req=..., buffer=...) at /greylock/src/server.cpp:752
#14 0x0000000000632ee5 in ioremap::thevoid::simple_request_stream<http_server>::on_close (this=0x7fb7e40011e0, err=...) at /usr/include/thevoid/stream.hpp:533
#15 0x00007fb811426e89 in ioremap::thevoid::connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::process_data() ()
   from /usr/lib/libthevoid.so.3
#16 0x00007fb811429531 in ioremap::thevoid::connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::handle_read(boost::system::error_code const&, unsigned long) () from /usr/lib/libthevoid.so.3
#17 0x00007fb811416168 in boost::asio::detail::reactive_socket_recv_op<boost::asio::mutable_buffers_1, ioremap::thevoid::detail::attributes_bind_handler<std::_Bind<std::_Mem_fn<void (ioremap::thevoid::connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > >::*)(boost::system::error_code const&, unsigned long)> (std::shared_ptr<ioremap::thevoid::connection<boost::asio::basic_stream_socket<boost::asio::ip::tcp, boost::asio::stream_socket_service<boost::asio::ip::tcp> > > >, std::_Placeholder<1>, std::_Placeholder<2>)> > >::do_complete(boost::asio::detail::task_io_service*, boost::asio::detail::task_io_service_operation*, boost::system::error_code const&, unsigned long) () from /usr/lib/libthevoid.so.3
#18 0x00007fb8113ece71 in boost::asio::detail::task_io_service::run(boost::system::error_code&) () from /usr/lib/libthevoid.so.3
#19 0x00007fb8113ee06e in boost::detail::thread_data<ioremap::thevoid::io_service_runner>::run() () from /usr/lib/libthevoid.so.3
#20 0x00007fb810abed3a in ?? () from /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.55.0
#21 0x00007fb810cd20a5 in start_thread (arg=0x7fb80911b700) at pthread_create.c:309
#22 0x00007fb80ffbccfd in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

Evgeniy Polyakov

unread,
Nov 12, 2015, 10:10:10 AM11/12/15
to Миша Минаев, reverbrain
Hi

12.11.2015, 17:02, "Миша Минаев" <minae...@gmail.com>:

> backtrace:
>
> #0  0x00007fb80fef9e37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007fb80fefb528 in __GI_abort () at abort.c:89
> #2  0x00007fb81080505d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #3  0x00007fb810802ed6 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #4  0x00007fb810802f21 in std::terminate() () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #5  0x00007fb810803139 in __cxa_throw () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
> #6  0x000000000055de31 in msgpack::operator>> (o=..., meta=...) at /greylock/include/greylock/index.hpp:755
> #7  0x00000000005e4aad in msgpack::object::convert<ioremap::greylock::index_meta> (this=0x7fb809117e10, v=0x7fb809117e50) at /usr/include/msgpack/object.hpp:266
> #8  0x00000000005d0b6e in msgpack::object::as<ioremap::greylock::index_meta> (this=0x7fb809117e10) at /usr/include/msgpack/object.hpp:273
> #9  0x00000000005b550c in ioremap::greylock::index<ioremap::greylock::bucket_transport>::index (this=0x7fb809118340, t=..., sk=..., read_only=false) at /greylock/include/greylock/index.hpp:123
> #10 0x0000000000598a9c in ioremap::greylock::read_write_index<ioremap::greylock::bucket_transport>::read_write_index (this=0x7fb809118340, t=..., start=...) at /greylock/include/greylock/index.hpp:704
> #11 0x000000000057e548 in http_server::on_index::process_one_document (this=0x7fb7e40011e0, req=..., mbox="test", doc=..., idxs=...) at /greylock/src/server.cpp:640


Looks like we do not properly sanitize input data.
We will look into it, thank you

Evgeniy Polyakov

unread,
Nov 14, 2015, 3:44:20 PM11/14/15
to Миша Минаев, reverbrain
Hi

12.11.2015, 18:10, "Evgeniy Polyakov" <z...@ioremap.net>:
I've added protection against this, but in particular your request has been indexed correctly.
This error rises when on-disk metadata (i.e. stored in elliptics) is corrupted, which should never happen unless
someone overwrote those files in parallel.

Please provide your config and insert/select jsons and both greylock and ioserv logs as attachmets.
Reply all
Reply to author
Forward
0 new messages