Package search

572 views
Skip to first unread message

Simon Hampton

unread,
Jun 30, 2017, 11:00:08 AM6/30/17
to elm-dev
I don't know whether I am up to this yet but I would certainly like to explore the package library in more detail, and perhaps get some experience with the charting libraries in Elm that are now available.

https://github.com/elm-lang/projects/blob/master/README.md#package-search  talks about json-in and json-out for the search. How could I get a copy of the json-in part - presumably a list of packages and the github repos, with possibly data on downloads?

Simon


Evan Czaplicki

unread,
Jun 30, 2017, 1:18:02 PM6/30/17
to elm-dev
Cool, that'll be a great project!

You can get a list of all known packages here.

If you want other chunks of data, I'd much rather give you a zip of info than have you try to extract it from the server with thousands of requests. So please let me know if you want all docs (or all whatever) at some point. To get started, you can download a few and test things out from links like this.

Not all data is tracked right now, so I'd recommend going to github for stars as a proxy for now. Once you demonstrate that it's useful, it'll be clearer how to add support for tracking that info.

Hope that helps, and let me know if you have any other questions!
Evan

--
You received this message because you are subscribed to the Google Groups "elm-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/ea1a2d5c-f4f5-4f54-befe-cd40a3e21fd5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Hot Belgo

unread,
Jul 1, 2017, 6:38:02 AM7/1/17
to elm...@googlegroups.com
Thanks, that's a great start. Let's see how much time the summer permits me to make progress

You received this message because you are subscribed to a topic in the Google Groups "elm-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elm-dev/-0SJ8sT_sNA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/CAF7GuPHUsmuN5vT%2Bp%3DO2ueEef44QqQ%2BdMx7F60ycBJJWSbS-Xg%40mail.gmail.com.

Simon Hampton

unread,
Jul 11, 2017, 11:02:47 AM7/11/17
to elm-dev
Hi all,

I've uploaded a first set of ideas - with just a first round of styling - to http://woebegone-tendency.surge.sh/. This would be to replace the main part of the existing page.

Note that the data is not guaranteed to be correct, and only 200 packages are included here.  In this demo, all the data is hard coded into the elm code, but in practise I would create json for the summaries, and then load details on request.

A proper backend will be needed - I'm thinking of using Firebase, and Firebase functions to keep it up to date. Alternatively we build on your existing backend - I think I know/could learn enough Haskell to do that.

I've tried to strike a balance between various objectives:

- providing an overview of the community as a whole. Hence a first page that lists repos by popularity (elm-lang/core is top of the full list) rather than the current somewhat arbitrary alphabetical order.
- enabling some comparison of packages, once you search for something (see https://groups.google.com/forum/#!topic/elm-discuss/ovfDyeB4Bhg for the challenges currently faced)  The big missing item here is number of downloads
- humanising with the use of avatar (c.f. npmjs)
- quick install (c.f. npmjs)

Let me know what you think

Simon

N H

unread,
Jul 11, 2017, 12:05:01 PM7/11/17
to elm...@googlegroups.com
The problem with listing via github stars is that it's not proportional to the actual usage of the package. The top five packages currently are not in heavy use in production. I don't think that's a great measurement to use. Useful to display, but it should not be the way things are sorted.

One good blog post or one good talk gets a lot of people staring repos that they never end up using -- we want to make it easier for Elm developers to firstly find the packages that solves their problems. A good example of this is elm-decode-pipeline, or elm-test. Both are almost universally used in production -- yet people still arrive and don't know which testing package to use, or ask about decoding things in a different way. Other packages, while awesome and interesting, do not have the same universalness as these packages.

Simon Hampton

unread,
Jul 11, 2017, 12:08:39 PM7/11/17
to elm-dev

I do agree, but this is the best proxy for popularity at present. As I noted originally, I would love to add analytics to elm-package install

To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+u...@googlegroups.com.

--
You received this message because you are subscribed to a topic in the Google Groups "elm-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elm-dev/-0SJ8sT_sNA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elm-dev+u...@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "elm-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+u...@googlegroups.com.

Richard Feldman

unread,
Jul 11, 2017, 12:39:25 PM7/11/17
to elm...@googlegroups.com

I do agree, but this is the best proxy for popularity at present. As I noted originally, I would love to add analytics to elm-package install

Yep, but I think it's better to have no signal than misleading signal. ;)

I appreciate the instinct to aim for doing better than the status quo, but my conclusion from looking at this is "we cannot do better than alphabetical sorting without tracking more information," which means a prerequisite for doing better than the status quo sorting is to track more information.

Evan Czaplicki

unread,
Jul 11, 2017, 4:45:44 PM7/11/17
to elm-dev
This thread got a few messages that I want to moderate because we are losing the thread with this stars stuff.


Summary
  • Martin points out that I recommended bringing in GitHub star information earlier in this thread.
  • Frank points out that other ecosystems use download counts. He points to data from rubygems.

Useful Facts

I recommended using GitHub stars specifically because:

1) Download counts are kind of silly. Download counts primarily count CI builds. Someone may be working on a super interesting project with WebGL, working on it 4 hours a day, but if they do not do CI, download counts will register them as one-ever whereas someone who runs CI when they change their README will register as hundreds-per-week. Point is, download count is not actually a great metric for assessing usage in a refined way.

2) Tracking download counts will be tricky in the Elm ecosystem. With 0.19 you will install all packages into a per-user cache. That means that once someone has elm-lang/core 6.0.0 on their computer, they will never download it again. If you care about fast builds, working offline, minimizing bandwidth costs, etc. this is an obviously superior design. We will also be recommending that people preserve this cache in their CI systems. This amplifies the inherent silliness of download counts.

These facts imply that it would be better to compute a "rating" for packages based on a variety of information. In the outline of this project, I suggest using doc coverage, number of examples, number of infix ops, etc. to create a composite rating. We could use GitHub stars. We could use download counts. It doesn't really matter if you have a way of weighting N variables into a single composite rating.

Finally, tracking more information blocks on me. It needs changes on our server that is not really necessary to do the majority of this project. So it is better to create a system that allows composite ratings and then bring in other data sources later.


Conclusion

We do not need to argue about this further. It doesn't really matter. Having a rating system is the important thing. There can be many data sources for that.


--
You received this message because you are subscribed to the Google Groups "elm-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/CAORaYgYkdigpODVLdt786M4GWjvk-hLxB20wBShXAfpcHihwiQ%40mail.gmail.com.

Christophe de Vienne

unread,
Jul 12, 2017, 6:21:46 AM7/12/17
to elm...@googlegroups.com
Hi,

I am sorry if this looks like going a little off-topic, but there is one
thing I would like to be said.

Le 11/07/2017 à 18:04, N H a écrit :
> The problem with listing via github stars is that it's not proportional
> to the actual usage of the package.

The thing bothering me with github stars is that it would bind the elm
ecosystem with github even more that it is today.

I, personally and professionally, use github as little as possible, and
if I want to create elm packages (which I do [1]), I feel forced to put
it on github. It is a pain to me and I had hope that it would be less
the case in the future.


Christophe

[1] https://github.com/orus-io/elm-nats

--
Christophe de Vienne

Hot Belgo

unread,
Jul 12, 2017, 9:20:40 AM7/12/17
to elm...@googlegroups.com
Ok, I'll have a go at a composite. This is my current thinking on ranking factors but the weights will have to be seen.

Simon

I can see how **downloads** could be misleading, but if downloads are caused by being part of a CI process, that represents a package being used in a production environment, which is probably the best indication of quality that exists. In any event, we don't have that data so that leaves stars as the basis of any ranking.

**Dependency graph** Some libraries are used by other libraries and we could use that information. However, the libraries that library authors use may well be different from those of value to end users. Arguably, though, it is end-user libraries that will attract the most Github stars, so perhaps these criteria complement each other?

A bad recommendation would be a package that is **abandoned**. Shortly, every library author will need to decide whether to make the effort to update to 0.19, and that becomes a natural winnowing process. After that, a recent patch indicates maintenance, while a minor/major version change may suggest more active. But are changes an indication of enhancements or bugfixes, and a library like turqu/base64 needs no further work, while doing it's job perfectly. That leads to perhaps the following index impacts:

 ```
                   Frequent updates    Infrequent updates
Many open issues    ++                  --
Few open issues     +                   +
```

Having a 'recently updated' section would cater to those that continue to develop their libraries.

**Click data** I presume this does not exist today, but one thing the new packages.elm-lang.org should do is collect click-throughs. That's pretty straightforward with Google Analytics and is a ranking factor Google clearly finds useful. **Would a PR for the existing site be welcomed?**

**Docs coverage / Examples** My libraries tend to include the examples I used while developing the library. But examples in the Readme (README, ReadMe,...) or in the function notes are equally valuable for developers. Can any, let alone all, of these be surfaced reliably and objectively. And if we counted the number of words in inline `{-| ... -}` docs, this would reward those that put their examples here.

An alternative would be augment elm-package.json with a section:

    docs : [
        { type: Blog | Example | Readme
        , link: https://....
        }
    ]

This could be included in the detailed view, would probably attract developers' attention, and thus reward assiduous library authors (and could be rewarded with a small ranking boost).

**Infix operators** while this would clearly fit completely with the Elm ethos, we may already be close the point where the remaining infix operators can be objectively justified. I will try to develop a script to see how widespread they remain in libraries for 0.18.

I also thought about **version numbers**, but that's very arbitrary - e.g. elm-form went back version 1.0.0 when elm-simple-form was renamed after reaching version 3.0.0.


--
You received this message because you are subscribed to a topic in the Google Groups "elm-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/elm-dev/-0SJ8sT_sNA/unsubscribe.
To unsubscribe from this group and all its topics, send an email to elm-dev+unsubscribe@googlegroups.com.

Evan Czaplicki

unread,
Jul 12, 2017, 2:13:41 PM7/12/17
to elm-dev
None of this ranking stuff is the important thing. You can just have a system with random ranks right now. Or alphabetical. The hard part is making the search as an server endpoint that serves JSON. Please focus on that if you are interested in this project.

Whoever is doing moderation on this thread, please enforce this.

(If you are punishing "having issues open" you will punish my management style, which I have argued creates better libraries. Having infrequent releases is also something I consider a mark of quality because it indicates that folks are taking their time and considering their choices carefully. But again, none of this matters right now. There are many signals, and it's all open to interpretation. Ultimately, if I am going to point the package website at some search endpoint, I will want to review how it ranks data, so this is design work that inherently will happen later anyway.)

--
You received this message because you are subscribed to the Google Groups "elm-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to elm-dev+unsubscribe@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/elm-dev/CAMZEP1oNiqSFu0pYW6O-avqooQroMizGwHK1Jey2B0EvYD358g%40mail.gmail.com.

Evan Czaplicki

unread,
Jul 12, 2017, 3:21:13 PM7/12/17
to elm-dev
Sorry that last email was not very constructive. When I am frustrated with a thread, I should try to do more to guide it in a better direction.

In that spirit, here are some questions that I think are more important concerns:
  • What language is the server implemented in? Am I familiar with it?
  • What is the JSON format of results?
  • What are the server endpoints? Can I ask for the 2nd page of results? How?
  • Is it computationally costly to run these queries? Can it not be?
The "minimum viable product" just replicates the existing functionality, making it practical to integrate and grow from there. That alone is a difficult problem in practice.

If continued discussion is valuable, it will happen in a new thread.
Reply all
Reply to author
Forward
This conversation is locked
You cannot reply and perform actions on locked conversations.
0 new messages