ReadParquet error

1,326 views
Skip to first unread message

Shweta Ramdas

unread,
Oct 27, 2020, 10:30:17 AM10/27/20
to PrediXcan/MetaXcan
Hello,

I'm trying to run the MetaXcan tutorial (integrating GTEx v8 models with CAD).

I'm using python 3.6.3. I'm running into an error when I run the imputation step.

python $GWAS_TOOLS/gwas_summary_imputation.py \
-by_region_file $DATA/eur_ld.bed.gz \
-gwas_file $OUTPUT/harmonized_gwas/CARDIoGRAM_C4D_CAD_ADDITIVE.txt.gz \ -parquet_genotype $DATA/reference_panel_1000G/chr1.variants.parquet \ -parquet_genotype_metadata $DATA/reference_panel_1000G/variant_metadata.parquet \ -window 100000 \
-parsimony 7 \
-chromosome 1 \
-regularization 0.1 \
-frequency_filter 0.01 \
-sub_batches 10 -sub_batch 0 \
--standardise_dosages \
-output $OUTPUT/summary_imputation/CARDIoGRAM_C4D_CAD_ADDITIVE_chr1_sb0_reg0.1_ff0.01_by_region.txt.gz

I run into the following error:
INFO - Beginning process
INFO - Creating context by variant
INFO - Loading study
INFO - Loading variants' parquet file
Traceback (most recent call last):
  File "summary-gwas-imputation-master/src//gwas_summary_imputation.py", line 97, in <module>
    run(args)
  File "summary-gwas-imputation-master/src//gwas_summary_imputation.py", line 60, in run
    results = run_by_region(args)
  File "summary-gwas-imputation-master/src//gwas_summary_imputation.py", line 40, in run_by_region
    context = SummaryImputationUtilities.context_by_region_from_args(args)
  File "predixcan/MetaXcan/summary-gwas-imputation-master/src/genomic_tools_lib/summary_imputation/Utilities.py", line 229, in context_by_region_from_args
    study = load_study(args)
  File "predixcan/MetaXcan/summary-gwas-imputation-master/src/genomic_tools_lib/summary_imputation/Utilities.py", line 162, in load_study
    study = Parquet.study_from_parquet(args.parquet_genotype, args.parquet_genotype_metadata, chromosome=args.chromosome)
  File "predixcan/MetaXcan/summary-gwas-imputation-master/src/genomic_tools_lib/file_formats/Parquet.py", line 218, in study_from_parquet
    _v = pq.ParquetFile(variants)
  File "~/.local/lib/python3.6/site-packages/pyarrow/parquet.py", line 201, in __init__
    read_dictionary=read_dictionary, metadata=metadata)
  File "pyarrow/_parquet.pyx", line 1021, in pyarrow._parquet.ParquetReader.open
  File "pyarrow/error.pxi", line 99, in pyarrow.lib.check_status
OSError: Couldn't deserialize thrift: TProtocolException: Exceeded size limit

--------------------------------------------------------------------------------------

I get the error even after increasing memory limits (upto 200GB). If I run the same command replacing chr1 with chr22, this error disappears (though I get another error: INFO - Error for region (22,15927607.0,17193405.0): AttributeError("'pyarrow.lib.ChunkedArray' object has no attribute 'name'",) and an empty output file) . If you could help me with this, that would be very helpful.

Thank you!
Shweta


Shweta Ramdas

unread,
Oct 27, 2020, 11:15:06 AM10/27/20
to PrediXcan/MetaXcan
I realised, after reading some of the older posts, that it's because I was using a newer version of pyarrow.

The error disappears with pyarrow 0.9.0

Thanks!
Shweta

Alvaro Barbeira

unread,
Oct 27, 2020, 12:10:11 PM10/27/20
to Shweta Ramdas, PrediXcan/MetaXcan
Hi Shweta,

I'm happy to hear you solved your issue. 
Unfortunately we have no plans to update the code to use newer versions of pyarrow at the moment.

Best,

Alvaro

--
You received this message because you are subscribed to the Google Groups "PrediXcan/MetaXcan" group.
To unsubscribe from this group and stop receiving emails from it, send an email to predixcanmetax...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/predixcanmetaxcan/5493755a-801e-4095-b4d5-941fcb0ccbafn%40googlegroups.com.

Kate Elliott

unread,
Jun 29, 2023, 10:53:40 AM6/29/23
to PrediXcan/MetaXcan
How did you manage to change to pyarrow 0.9.0?  I'm having trouble on my system.  

(base) [vwy332@rescomp1 TWAS]$ pip uninstall pyarrow
Found existing installation: pyarrow 12.0.1
Uninstalling pyarrow-12.0.1:
  Would remove:
    /gpfs3/users/jknight/vwy332/.local/lib/python3.9/site-packages/pyarrow-12.0.1.dist-info/*
    /gpfs3/users/jknight/vwy332/.local/lib/python3.9/site-packages/pyarrow/*
Proceed (Y/n)? Y
  Successfully uninstalled pyarrow-12.0.1
(base) [vwy332@rescomp1 TWAS]$ pip install pyarrow==0.9.0
Defaulting to user installation because normal site-packages is not writeable
Collecting pyarrow==0.9.0
  Using cached pyarrow-0.9.0.tar.gz (8.5 MB)
    ERROR: Command errored out with exit status 1:
     command: /apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/bin/python -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/setup.py'"'"'; __file__='"'"'/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-m2v0int2
         cwd: /tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/
    Complete output (31 lines):
    /apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/lib/python3.9/site-packages/setuptools/installer.py:27: SetuptoolsDeprecationWarning: setuptools.installer is deprecated. Requirements should be satisfied by a PEP 517 installer.
      warnings.warn(

    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/setup.py", line 473, in <module>
        setup(
      File "/apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/lib/python3.9/site-packages/setuptools/__init__.py", line 87, in setup
        return distutils.core.setup(**attrs)
      File "/apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 109, in setup
        _setup_distribution = dist = klass(attrs)
      File "/apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/lib/python3.9/site-packages/setuptools/dist.py", line 462, in __init__
        _Distribution.__init__(
      File "/apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 293, in __init__
        self.finalize_options()
      File "/apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/lib/python3.9/site-packages/setuptools/dist.py", line 886, in finalize_options
        ep(self)
      File "/apps/eb/2020b/ivybridge/software/Anaconda3/2022.05/lib/python3.9/site-packages/setuptools/dist.py", line 907, in _finalize_setup_keywords
        ep.load()(self, ep.name, value)
      File "/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/.eggs/setuptools_scm-7.1.0-py3.9.egg/setuptools_scm/integration.py", line 91, in version_keyword
        _assign_version(dist, config)
      File "/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/.eggs/setuptools_scm-7.1.0-py3.9.egg/setuptools_scm/integration.py", line 60, in _assign_version
        maybe_version = _get_version(config)
      File "/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/.eggs/setuptools_scm-7.1.0-py3.9.egg/setuptools_scm/__init__.py", line 153, in _get_version
        parsed_version = _do_parse(config)
      File "/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/.eggs/setuptools_scm-7.1.0-py3.9.egg/setuptools_scm/__init__.py", line 87, in _do_parse
        parse_result = _call_entrypoint_fn(config.absolute_root, config, config.parse)
      File "/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/.eggs/setuptools_scm-7.1.0-py3.9.egg/setuptools_scm/_entrypoints.py", line 49, in _call_entrypoint_fn
        return fn(root)
      File "/tmp/pip-install-e4r29t4k/pyarrow_36d0cf74ffca44f7957968cf7bf9aaba/setup.py", line 457, in parse_version
        describe = setuptools_scm.git.DEFAULT_DESCRIBE + " --match 'apache-arrow-[0-9]*'"
    TypeError: can only concatenate list (not "str") to list
    ----------------------------------------
WARNING: Discarding https://files.pythonhosted.org/packages/be/2d/11751c477e4e7f4bb07ac7584aafabe0d0608c170e4bff67246d695ebdbe/pyarrow-0.9.0.tar.gz#sha256=7db8ce2f0eff5a00d6da918ce9f9cfec265e13f8a119b4adb1595e5b19fd6242 (from https://pypi.org/simple/pyarrow/). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement pyarrow==0.9.0 (from versions: 0.9.0, 0.10.0, 0.11.0, 0.11.1, 0.12.0, 0.12.1, 0.13.0, 0.14.0, 0.15.1, 0.16.0, 0.17.0, 0.17.1, 1.0.0, 1.0.1, 2.0.0, 3.0.0, 4.0.0, 4.0.1, 5.0.0, 6.0.0, 6.0.1, 7.0.0, 8.0.0, 9.0.0, 10.0.0, 10.0.1, 11.0.0, 12.0.0, 12.0.1)
ERROR: No matching distribution found for pyarrow==0.9.0

Festus

unread,
Jun 29, 2023, 5:52:40 PM6/29/23
to PrediXcan/MetaXcan
Hi Kindly create a conda environment using this guide here https://github.com/hakyimlab/MetaXcan/tree/master#example-conda-environment-setup

Activate the environment and use it
Reply all
Reply to author
Forward
0 new messages