Package metadata scraping

35 views
Skip to first unread message

Matt D

unread,
Dec 22, 2021, 1:38:35 PM12/22/21
to Co...@anaconda.com
Hi so I work on an air tight system that requires a form for everything installed. We set up environments on one system using conda and then fill out the forms for submission and then get them installed on the airtight system.

I've been working on a solution to automate this using a combination of conda.api.PrefixData and importlib.metadata

conda.api.PrefixData.iter_records is not returning records for pypi packages installed into the environment. I was wondering if someone could point to me the best way to get records for the installed pypi packages so I can process them using importlib.metadata

I was trying this on 4.10.3 and 4.11 and neither return records for pypi packages. 




--
Matt Delengowski

Chris Barker - NOAA Federal

unread,
Dec 22, 2021, 2:39:07 PM12/22/21
to Matt D, co...@anaconda.com
Hmm.

The fact is that conda packages and pip packages are fundamentally different. While conda does provide a few things to make them work together, I Don‘t  know that it should provide a full compatibility layer for this kind of thing. 

My thoughts: 

1) make sure there are conda packages for everything your application needs. I’ve found that more stable and easier to maintain than mixing and matching. 

A bit of upfront work, yes, but pretty much anything pip-installable is easy to make a conda package for. 

And if you get it submitted to conda-forge, the maintenance is almost automatic. 

2) Most conda packages of Python packages are properly installed with pip — so you may be able to simple use the pip metadata and be done.

3) it sounds like what you doing is not so bad - get the conda metadata from conda, and the pip metadata from pip.

One challenge is the the conda package name may or may not be the same as the pip package name :-)

-CHB

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

On Dec 22, 2021, at 10:38 AM, Matt D <matt.del...@gmail.com> wrote:

Hi so I work on an air tight system that requires a form for everything installed. We set up environments on one system using conda and then fill out the forms for submission and then get them installed on the airtight system.
--
You received this message because you are subscribed to the Google Groups "conda - Public" group.
To unsubscribe from this group and stop receiving emails from it, send an email to conda+un...@anaconda.com.
To view this discussion on the web visit https://groups.google.com/a/anaconda.com/d/msgid/conda/CAEMwYmEau0DXizBrNOqU5nA4Aahpw7r5Nsw%3DtZO3i6U-5C7Ezw%40mail.gmail.com.

Matt D

unread,
Dec 22, 2021, 6:17:30 PM12/22/21
to Chris Barker - NOAA Federal, co...@anaconda.com
Hi Chris, thanks for the reply.


So I absolutely do not expect conda to parse pypi metadata for me. Infact, I have that power using importlib.metadata

The issue I have is that unless I have the current env I want to scrape active, I need the path to the site package to use importlib.

Now I was playing around with conda.api.PrefixData._internals._pip_internal_interop and was able to get conda.api.PrefixData.iter_records to provide me with records of the pypi packages. These records of course do not have the full set of metadata that a conda record would but it atleast provides me with the path to the site-pack installation so I can use importlib.

What I am asking is that conda at least provide these pypi records without me needing to dig around with the internals. 

By the comments on the file, it seems like that _pip_internal_interop is to be removed in the future. I would like it to not be. 

Honestly I would expect conda.api.PrefixData.iter_records to provide a record of every package that gets listed by the command

conda list 


From: Chris Barker - NOAA Federal <chris....@noaa.gov>
Sent: Wednesday, December 22, 2021, 2:39 PM
To: Matt D
Cc: co...@anaconda.com
Subject: Re: [conda] Package metadata scraping

Chris Barker

unread,
Dec 23, 2021, 12:10:58 PM12/23/21
to Matt D, co...@anaconda.com
On Wed, Dec 22, 2021 at 3:17 PM Matt D <matt.del...@gmail.com> wrote:
The issue I have is that unless I have the current env I want to scrape active,

ahh -- I see, yes, I was assuming you'd do you analysis in a active environment. This does bring up one more reason why you'd want to be able to activate an environment in a script, which is  hard to do (there's a very long running gitHub issue about it -- I've lost track but it may be possible.

There is conda execute -- then you could run your metadata gathering script in an activated environment on the fly.

What I am asking is that conda at least provide these pypi records without me needing to dig around with the internals. 

exactly what metadata are you looking for?

Honestly I would expect conda.api.PrefixData.iter_records to provide a record of every package that gets listed by the command

Sorry -- you're getting out of my depth here -- good luck!

-CHB


--

Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

Matt D

unread,
Dec 29, 2021, 10:52:34 AM12/29/21
to conda - Public, chris....@noaa.gov, co...@anaconda.com, Matt D
Hope your Holiday is going nicely Chris.

On Thursday, December 23, 2021 at 12:10:58 PM UTC-5 chris....@noaa.gov wrote:
ahh -- I see, yes, I was assuming you'd do you analysis in a active environment. This does bring up one more reason why you'd want to be able to activate an environment in a script, which is  hard to do (there's a very long running gitHub issue about it -- I've lost track but it may be possible. 
There is conda execute -- then you could run your metadata gathering script in an activated environment on the fly.

  Please see final reply.
 
exactly what metadata are you looking for?

  For forms I currently have to submit:
  • Description
  • Name
  • Download URL
  • Download Name
  • Version
  • Platform (linux, etc.)
  • License type and/or license
  • Home Page URL
  • Source Code URL
  Might be more but it's what I remember off of the top of my head. I think the important point is that all of this metadata can be found for each package type
  •  Conda
    • about.json
    • recorddata.json (I might be misremembering this name)
    • meta.yaml (aka the conda recipe if available)
  • Pypi
    • <path to python env>/<python version>/site-packages/<package>/info/META
      • Or using importlib.meta it provides me with the content of META I just need to know
        • location of site package 
          • The record provided by  conda.api.PrefixData.iter_records has this path
        • package name
    • Pypi package json request
   Now I can get all of this information (specifically the paths for the conda packages) using the conda API in its current standing. I can get conda to provide same information for Pypi but it requires me digging into internals as described below.

Sorry -- you're getting out of my depth here -- good luck!

 Not a problem!

Maybe I am misunderstanding what's conda declares as public vs not. I was just going by what's described at https://docs.conda.io/projects/conda/en/latest/api/index.html

Specifically  conda.api.PrefixData.iter_records 

By default, that method does not return a record for Pypi packages. I was looking at the source code 


If you follow that then it can indeed provide records to Pypi packages in the environment but you have to set a private attribute to True


Looking at that source code it seems like conda plans on deprecating that feature. I don't know why, but I think it would be beneficial to have be True by default or at least provide a public attribute/method to enable it. If it were public then I don't need to use conda execute to get this information.

So all in all, conda already has the capability but it just seems to want to disable it by default.
Reply all
Reply to author
Forward
0 new messages