Package digest

532 views
Skip to first unread message

Jiahao Chen

unread,
Sep 25, 2014, 2:58:43 PM9/25/14
to juli...@googlegroups.com
I subscribe to Python Weekly and find it a very nice digest of new and notable things in Python land. I've wished that something similar would happen for Julia, but I don't want the job of manually curating a newsletter. So instead, I wrote a script to see how much of the job of creating a weekly digest can be automated.

This gist contains a quick hack that summarizes changes to the Julia package repository in the past week. It scans the local METADATA repository for commit messages and identifies packages by name and version tag (the commits have subjects like "Tag XXXX vN.N.N").
  • For newly registered packages (with commit subject "Register XXXX..."), the script also downloads the package's README.md from its Github repository and grabs a couple of paragraphs from the beginning of the file to provide a summary of what this package is about.
  • For packages with updated version tags, the script will scan the relevant commit messages to see if anything interested was entered into the commit message. (Merge commits are ignored)
As you can see from the output of the script, not many people enter interesting text into the commit message when submitting to METADATA. But perhaps if people would be interested in using the commit message as a mechanism for providing summary updates, perhaps this script will be useful for keeping track of package updates as well. Another possibility is to have the script clone and traverse the package's commit history as well, but that is a considerably more tedious task and I'm not sure package developers would want all the gory details of their package development revealed in a package digest.

Ivar Nesje

unread,
Sep 25, 2014, 3:20:20 PM9/25/14
to juli...@googlegroups.com
Good idea!

I don't think we can get anything really worth reading, unless we have a minimal amount of manual curation, but a script would be a great start, and curating might only consist of deleting the bad parts.

For maintenance of the release-0.3 branch we have a @juliabackports user that receives email when it is mentioned in a conversation and the email is a google group. We might consider creating a similar @juliaweekly user and let people ping it when they want a PR, Issue, or commit mentioned in the weekly digest. People should provide a tweet size digest when they mention @juliaweekly, so you don't have to write anything.

Ivar

Steve Kelly

unread,
Sep 25, 2014, 3:20:50 PM9/25/14
to juli...@googlegroups.com
I think encouraging a NEWS.md would make this easier.

Miles Lubin

unread,
Sep 25, 2014, 4:11:33 PM9/25/14
to juli...@googlegroups.com
+1 for NEWS.md. This is probably the most effective way to communicate breaking changes and new features to users versus digging though commit logs/PRs/google groups. Hooking this into an automated infrastructure would be even better.

Jiahao Chen

unread,
Sep 26, 2014, 12:01:42 PM9/26/14
to juli...@googlegroups.com
I've finished the next iteration of the digester script, which also traverses into each package's repository to pull out one-line summaries of every commit between the most recent tagged version and the most recent tagged version _prior_ to the oldest tagged version in the digest time range. (The idea is to find all the changes from the most recent version at the start of the digest period to the most recent version at the end of the digest period.)

Attached to the gist also is its output in Markdown format when run a few minutes ago to summarize changes over the past two weeks.

Other ideas are:
- summarize Github activity (PRs opened/merged/closed, issues opened/closed, text posted in issue/PR bodies and comments)
- parse code for API changes
- automatic text summarization (e.g. using belief propagation algorithms)

Thanks,

Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory

Tim Holy

unread,
Sep 26, 2014, 12:13:43 PM9/26/14
to juli...@googlegroups.com
That's starting to sound really useful. Nice work!

--Tim

On Friday, September 26, 2014 12:01:19 PM Jiahao Chen wrote:
> I've finished the next iteration of the digester script
> <https://gist.github.com/jiahao/4795b16cbd1c556acd35/bb5f36f176b58b1eac7039a
> 8e45c3b20d0f4762a#file-pkgdigest-jl>, which also traverses into each
> package's repository to pull out one-line summaries of every commit between
> the most recent tagged version and the most recent tagged version _prior_
> to the oldest tagged version in the digest time range. (The idea is to find
> all the changes from the most recent version at the start of the digest
> period to the most recent version at the end of the digest period.)
>
> Attached to the gist also is its output in Markdown format
> <https://gist.github.com/jiahao/4795b16cbd1c556acd35#file-digest-md> when
> run a few minutes ago to summarize changes over the past two weeks.
>
> Other ideas are:
> - summarize Github activity (PRs opened/merged/closed, issues
> opened/closed, text posted in issue/PR bodies and comments)
> - parse code for API changes
> - automatic text summarization (e.g. using belief propagation algorithms)
>
> Thanks,
>
> Jiahao Chen
> Staff Research Scientist
> MIT Computer Science and Artificial Intelligence Laboratory
>
> On Thu, Sep 25, 2014 at 4:11 PM, Miles Lubin <miles...@gmail.com> wrote:
> > +1 for NEWS.md. This is probably the most effective way to communicate
> > breaking changes and new features to users versus digging though commit
> > logs/PRs/google groups. Hooking this into an automated infrastructure
> > would
> > be even better.
> >
> > On Thursday, September 25, 2014 3:20:50 PM UTC-4, Steve Kelly wrote:
> >> I think encouraging a NEWS.md would make this easier.
> >>
> >> On Thu, Sep 25, 2014 at 2:58 PM, Jiahao Chen <cji...@gmail.com> wrote:
> >>> I subscribe to Python Weekly <http://www.pythonweekly.com/> and find it
> >>> a very nice digest of new and notable things in Python land. I've wished
> >>> that something similar would happen for Julia, but I don't want the job
> >>> of
> >>> manually curating a newsletter. So instead, I wrote a script to see how
> >>> much of the job of creating a weekly digest can be automated.
> >>>
> >>> This gist <https://gist.github.com/jiahao/4795b16cbd1c556acd35>
> >>> contains a quick hack that summarizes changes to the Julia package
> >>> repository in the past week. It scans the local METADATA repository for
> >>> commit messages and identifies packages by name and version tag (the
> >>> commits have subjects like "Tag XXXX vN.N.N").
> >>>
> >>> - For newly registered packages (with commit subject "Register
> >>> XXXX..."), the script also downloads the package's README.md from its
> >>> Github repository and grabs a couple of paragraphs from the beginning
> >>> of
> >>> the file to provide a summary of what this package is about.
> >>> - For packages with updated version tags, the script will scan the

Stefan Karpinski

unread,
Sep 26, 2014, 2:29:22 PM9/26/14
to Julia Dev
Very nice!

Jiahao Chen

unread,
Sep 29, 2014, 11:13:28 AM9/29/14
to juli...@googlegroups.com

Tim Holy

unread,
Sep 30, 2014, 5:49:49 AM9/30/14
to juli...@googlegroups.com
Cool. I did notice that the version that grabs the issue activity has a
formatting issue starting about 60% down (inside Mustache).

--Tim

On Monday, September 29, 2014 08:13:28 AM Jiahao Chen wrote:
> Here's another iteration
> <https://gist.github.com/jiahao/4795b16cbd1c556acd35/92ffdedf55dd915b2b6a454
> 31b723b3d16ffc119> that also grabs all recent issue activity
> <https://gist.github.com/jiahao/4795b16cbd1c556acd35/d5583e6440856a5048374f8
> 775ec9aee36d785d0> .

Jiahao Chen

unread,
Oct 1, 2014, 6:08:47 PM10/1/14
to juli...@googlegroups.com
Yes, this is a known issue with Markdown.jl.

Thanks,

Jiahao Chen
Staff Research Scientist
MIT Computer Science and Artificial Intelligence Laboratory

Miguel Gaspar

unread,
Oct 26, 2014, 5:12:07 PM10/26/14
to juli...@googlegroups.com
Do you have any plans for running this regularly and making the digest available?

Miguel Gaspar

unread,
Oct 26, 2014, 5:28:18 PM10/26/14
to juli...@googlegroups.com
I have just found that you already did it - somehow. Nice work with the Package Ecosystem Pulse!
Reply all
Reply to author
Forward
0 new messages