Seg2 To Segy

1 view

Skip to first unread message

Brian Scarano

unread,

Aug 5, 2024, 10:53:10 AM8/5/24

to diamamomo

Iam hereby forwarding an email from a student at ICG who experienced problems with the third-party code seg2segy distributed with SeismicUnix4.1. For one reason or the other, his email was not distributed to the list.

The module was designed with simplicity and usability in mind. The code ispure Python and kept deliberately simple to get students participating ourGeophysics classes and exercises going with Python and seismic data. The codeis not meant to offer all functionality most likely required in a commercialprocessing environment. Although best performance, highest throughputand minimizing memory footprint are not at the heart of this module, we havetried to keep these topics in mind and use, for instance, memory-mapped I/Owhere possible. The module has been used successfully to analyze and readSEG-Y data sets of approx. 10 TB in size.

There are quite a few great Python packages available to read and/or writeseismic data, in particular when given as SEG-Y files. Many of them are, however,from our perspective inherently designed to primarily deal with 3D poststackdata leading toward seismic interpretation. Some assume a certain 3Dinline/crossline geometry, others can only read certain pre-sorted data sets,or the reading of SEG-Y data seems to have been added later but was never theprimary goal in the first place and therefore compromises were made. The seisiomodule at hand tries to avoid making any assumptions about the geometry andallows a user to read 2D and 3D pre- and poststack data in various flexible ways.

Note: As it stands, SEG-Y or SU data need to have a constant trace length.The SEG-Y standard allows for the number of samples to vary trace by trace -this makes reading seismic data from disk rather inefficient, though. Themodule could easily be changed to work with varying trace lenghts if necessary,we would simply have to scan the whole file first sequentially to store thenumber of samples per trace and the byte offset within the file at which eachtrace starts. Such an approach would be similar to reading SEG2 data where tracepointers are stored explicitly.

That's it, you're done. The variable dataset is a Numpy structured arraythat contains all the trace headers and the data themselves (don't try thiswith a large data set unless you have plenty of RAM available - large data setsshould be read in a different way, see below). The code will figure out thetype of seismic file from the suffix of the file name - if your file comeswith an unusual suffix or no suffix at all, you may have to specify the filetype explicitly (e.g., filetype="SGY").

Creating a file is also quite simple. If you would like to write data inbig-endian byte order after (re-)calculating the offset header value fromthe source and receiver group x-coordinates (assuming here that we dealwith a 2D seismic line and can ignore the y-components) simply requires:

This would create a SEG-Y rev. 1.0 file (default if no revision is explicitlyrequested) using IEEE floats (format 5) in big-endian byte order, and thetextual header would be encoded as EBCDIC. The init() method would create adefault textual and binary file header for you (similar to SU's segyhdrscommand), but you could of course also get a template,create your own file headers, or clone file headers from another file and thenpass them to the init() method, together with any extended textual headerrecords (if applicable). The finalize() method would write any trailer records(if applicable; to be user-supplied as arguments); as last step, it wouldre-write the SEG-Y binary file header to reflect the correct number of traces ortrailers in the file.

Trace headers would automatically be transferred from the SU trace header table(input) to the SEG-Y trace header table (output). This is relativelystraightforward as the majority of mnemonics are identical, but SU-specifictrace headers like d1 or f2 would be dropped. If they need to be preserved,a custom-made SEG-Y trace header definition JSON file would have to be providedthat contains these header mnemonics (so they can be matched), or these headermnemonics would have to be remapped using the remap="from": "to" parameter(dictionary) of the write_traces() method.

Theoretically, the init() and finalize() methods could be made obsolete byforcing the user to provide all required file headers, extended file headersand/or trailer records when creating the output object. This has deliberatelybeen avoided as it allows users to get header templates via

that are already pre-filled with required information (such as the data format,the number of samples, the sampling interval, the SEG-Y revision number, thefixed-trace-length flag, header stanzas, and so on). It is perhaps a matter ofpersonal preference but the current choice seems somewhat more user-friendlyand more robust in terms of setting all values required by the SEG-Y standardcorrectly.

One key feature of seisio is the ability to read data in arbitrary order. Inorder to achieve this, we need to scan all trace headers and create a lookupindex. If you would like to read prestack data grouped by the xline andiline trace headers and each ensemble should be sorted by offset, butyou would also like the offset range to be restricted to a maximum of 4000 m,then this could be achieved as follows:

This would loop through all indiviual ensembles one at the time, and eachensemble would have traces with the same xline-iline combination sortedby increasing (which is the default) offset, but no offset value in any ofthe ensembles would be greater than 4000 m. Obviously, for large data sets,holding the lookup index in memory, although restricted to the minimum numberof traceheader mnemonics required, possibly requires quite some memory, i.e.,there is some overhead. This is where seismic data stored as HDF5 (or NETCDF4)files comes in (another of our Python modules) where trace headers can be readilyaccessed and analyzed without loading them into memory by Python modules like"dask" or "vaex".

which allows you to get multiple batches of traces from the seismic file (inthis case, we would read 3 blocks of 2 traces, the first block would start attrace number 0, and the first trace in each block would be 4 traces from thefirst trace in the previous block, i.e., we would read trace numbers 0, 1, 4,5, 8, and 9). A very simple way of looping through a file is as follows:

SEG2 data sets are often relatively small, or there are individual SEG2 filesfor the survey's shots. SEG2 strings in the descriptor blocks are often (atleast in practical terms) not complying with the SEG2 standard (many companiesadd their own strings), i.e., reading of SEG2 data files into Numpy structuredarrays with strict types or parsing SEG2 strings to put values (of a certaintype) in a SEG-Y-like trace header table is complicated or sometimes not evenpossible, resulting in errors or loss of information. Therefore, when readingSEG2 data files, the seisio module returns the traces as standard 2D Numpyarray with a separate Pandas dataframe with strings and values contained in thetrace descriptor blocks.

The trace lengths can vary, the module will scan for the maximum number ofsamples per trace and allocate a Numpy array accordingly, padding shortertraces with zeros where necessary. The actual number of samples per tracesis stored as additional string in the Pandas dataframe. Example:

By including SEGY-SAK in your toolbox you will beable to load or transform the original binary SEG-Y data into more accessible andPython friendly formats. It leverages the work of Segyio to simplifyloading of common SEG-Y formats into xarray.Dataset objects for ease of use andto NetCDF4 files for better on disk and large file performance using dask.Tools to help users create new volumes and to return data to SEG-Y are also included.

segysak was originally conceived out of a need for a better interface to SEG-Y datain Python. The groundwork was layed by Tony Hallam but development really beganduring the Transform 2020 Software Underground Hackathon held online acrossthe world due to the cancellation of of the EAGE Annual in June of that year.Significant contributions during the hackathonwere made by Steve Purves, Gijs Straathof, Fabio Contreras and Alessandro Amato del Monte.

Significant updates were made at Transform 2021.Multiple new and advanced examples were released.A 2 hour video tutorial and notebook as a demonstration of key functionalityand an introduction to Xarry for seismic applications streamed.Experimental ZGY support was introduced.

For reading, the access mode r is preferred. All write operations willraise an exception. For writing, the mode r+ is preferred (as rw wouldtruncate the file). Any mode with w will raise an error. The modes usedare standard C file modes; please refer to that documentation for acomplete reference.

If ignore_geometry=True, segyio will not try to build iline/xline orother geometry related structures, which leads to faster opens. This isessentially the same as using strict=False on a file that has nogeometry.

Create a new segy file with the geometry and properties given by spec.This enables creating SEGY files from your data. The created file supportsall segyio modes, but has an emphasis on writing. The spec must becomplete, otherwise an exception will be raised. A default, empty spec canbe created with segyio.spec().

Very little data is written to the file, so just calling create is notsufficient to re-read the file with segyio. Rather, every trace header andtrace must be written to the file to be considered complete.

The segyio.spec() function will default sorting, offsets and everythingin the mandatory group, except format and samples, and requires the callerto fill in all the fields in either of the exclusive groups.

If any field is missing from the first exclusive group, and the tracecountis set, the resulting file will be considered unstructured. If thetracecount is set, and all fields of the first exclusive group arespecified, the file is considered structured and the tracecount is inferredfrom the xlines/ilines/offsets. The offsets are defaulted to [1] bysegyio.spec().