Go implementation of CLP

686 views
Skip to first unread message

ChrisLu

unread,
Sep 30, 2022, 3:58:44 AM9/30/22
to golang-nuts
Seems there are no Go implementation for Compressed Log Processor (CLP) yet? 

CLP is a tool capable of losslessly compressing text logs and searching the compressed logs without decompression.

https://github.com/y-scope/clp

Chris

david lion

unread,
Oct 2, 2022, 9:45:55 PM10/2/22
to golang-nuts
Hi Chris,

I'm one of the CLP developers. We'd be interested in hearing your use case and any features you'd like implemented in CLP.
As for a Go implementation, are you asking about bindings to directly call CLP functionality from a Go program or are you interested in a complete re-write in Go?

-david

Chris Lu

unread,
Oct 2, 2022, 10:36:38 PM10/2/22
to david lion, golang-nuts
Thanks! CLP is great! 

I am working on a distributed file system, SeaweedFS, https://github.com/seaweedfs/seaweedfs
I am interested in a pure-go implementation to store the log files more efficiently.

Chris

--
You received this message because you are subscribed to a topic in the Google Groups "golang-nuts" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/golang-nuts/XeDIZfTMlX8/unsubscribe.
To unsubscribe from this group and all its topics, send an email to golang-nuts...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/golang-nuts/0c57905a-1096-4688-b268-93742c48316fn%40googlegroups.com.

david lion

unread,
Oct 4, 2022, 12:33:34 PM10/4/22
to golang-nuts
It would be awesome to help SeaweedFS, so I'm eager to hear more about your ideas to try and understand anything we can do.

Sorry if some of my questions are naive. At a high level I'm trying to better understand two topics:
1. the use cases (or features) we/CLP can help with
2. the technical aspects/limitations that make a pure-go implementation necessary/beneficial

I saw that SeaweedFS currently can automatically compresses certain file types with Gzip (very cool btw).
I can imagine that augmenting this feature with CLP for log file types could be beneficial.
Are these the log files you're referring to? Does SeaweedFS itself also produce log files that can benefit from CLP? Perhaps CLP could help with both cases?

As I imagine packaging the current form of CLP with SeaweedFS is awkward, is packaging a main reason for a pure-go implementation?
Would a library with programmatic API be favourable for integration rather than calling out to another tool/executable (as required in CLP's current form)?
Would a go package using cgo (to reuse some existing cpp code) be desirable or is there a requirement for strictly go code?

-david

Bharghava Varun Ayada

unread,
Nov 9, 2022, 3:14:03 AM11/9/22
to golang-nuts
I'm exploring CLP for a logging platform and I think either a cgo or native go library might be something that would help us.

Jason E. Aten

unread,
Nov 13, 2022, 4:28:12 AM11/13/22
to golang-nuts
CLP looks impressive. 

Is there is a C API then it would probably be easy to write Go bindings to it.

Since there is some kind of python integration, perhaps a C API is already there.
I took a quick look but could only see C++ code; it would have to be an actual C (not C++) API for cgo to be usable.

Usually this is not too difficult to add to a C++ project.

david lion

unread,
Nov 18, 2022, 2:20:58 PM11/18/22
to golang-nuts
At the moment we're working on a Go binding around the parsing component of CLP.
You're right, the current code is all C++, but we'll be moving the parsing component to be standalone and creating a C API for it.

Right now, we're only creating bindings for the parsing component since it seems to be the common element in the use-cases we have come across, but we'd love to hear about any use-cases you're interested in and how we could support them. For example, with SeaweedFS it makes more sense to integrate most of the CLP logic directly into the filesystem itself.

Jason E. Aten

unread,
Nov 18, 2022, 8:09:06 PM11/18/22
to david lion, golang-nuts
Sorry for not being familiar with the terminology. I thought the search through compressed logs was the most interesting part. Is that included in the "parsing component"?

david lion

unread,
Nov 19, 2022, 9:45:22 AM11/19/22
to golang-nuts
Ah, the parsing component doesn't include search, but it is a necessary part to enable searching compressed logs. CLP's searching of compressed logs is enabled by the storage format it uses (as it both compresses and indexes the logs). The parsing component is how we extract information from the logs necessary for the storage format.

We can write bindings for search as well---we only started with compression because that's what we had more requests for. To focus our effort, could you let us know more about what kind of queries you want to run? The open-source C++ code supports wildcard queries but some users prefer grep-like regex, SQL-like boolean expressions, etc. We haven't yet settled on one, especially for a binding that needs to be somewhat stable.

Jason E. Aten

unread,
Nov 19, 2022, 4:34:27 PM11/19/22
to david lion, golang-nuts
On Sat, Nov 19, 2022 at 8:45 AM david lion <david...@gmail.com> wrote:
...could you let us know more about what kind of queries you want to run? The open-source C++ code supports wildcard queries but some users prefer grep-like regex, SQL-like boolean expressions, etc. We haven't yet settled on one, especially for a binding that needs to be somewhat stable.

I'm not sure what SeaweedFS needs, so maybe Chris Lu will chime in too.

Usually when I'm going through logs, I can narrow my search down to a time span with at least roughly known begin and end points. Then inside that span of log entries, well its really anything goes. It could be content based search, it could be field = value based search; it just really depends on the task at hand. So the only thing that I can say with conviction is hopefully you can make it very easy to specify an ISO8601 begin timestamp (down to the nanosecond) and an ISO8601 endx timestamps (down to the nanosecond) so that on the [begin, endx) interval is the only thing searched.  The [begin, endx) convention is typical for rolling queries that should not overlap, since incrementing both begin timestamp and endx timestamp by (endx-begin) for example will partition (in the mathematical sense of creating exhaustive and mutually exclusive subsets) your data.

Jason E. Aten

unread,
Nov 19, 2022, 6:25:33 PM11/19/22
to david lion, golang-nuts
on the off-chance that ISO8601 timestamp parsing is new (unlikely since you're dealing with logs), I'll point out that parsing them is relatively easy in Go:

import "time"
const RFC3339NanoNumericTZ = "2006-01-02T15:04:05.999999999-07:00"
tmstr := "2019-05-23T15:09:57.000-05:00"
tm, err = time.Parse(RFC3339NanoNumericTZ, tmstr)
panicOn(err)
Reply all
Reply to author
Forward
0 new messages