Using a C preprocessor with gyp?

128 views
Skip to first unread message

Dale Curtis

unread,
Oct 2, 2012, 11:45:26 PM10/2/12
to gyp-de...@googlegroups.com, Ronald Bultje
All,

I've got a bit of an unusual problem for which gyp is giving me a headache :(. Ronald and I are working to build FFmpeg through MSVC++, unfortunately MSVC++ doesn't support the C99 syntax used by FFmpeg. To workaround this Ronald created a preprocessor which converts the C99 to C89 which we can then compile with MSVC++. Getting this working in GYP has been an uphill battle thus far to say the least. Based on discussions with evanm we figured asking a wider audience for help would be wise.

Basically we have an executable which we want to call for a each file in a list of input .c files which generate another set of .c files which should be treated as sources to be compiled. How to do this?

Setting up a rule for extension '.c' with the action to run the preprocessor and generate an intermediate '.c' file doesn't work. It fails because a "sources" entry matching a rule doesn't remove the entry from the list of sources to be compiled. So even though preprocessing completes successfully, the final "sources" list includes the original file and the preprocessed output.

Which leaves us with creating an action which calls a python script or similar with an input file list (which is too long for the command line, thus necessitating a file list...). Which sucks for a few reasons:
- It's ugly: it means duplicating our input sources list as an output sources list with the minor modification of prepending each entry with the intermediate path.
- It's slow: Loses parallelism inherent in ninja and other parallelized build tools.
- It's extra slow: Touching one file means rerunning the action for all files.

On top of all this, the preprocessor needs to invoke "cl.exe /P" to create a preprocessed source which it can translate to C89 using libclang. This means we need to invoke "cl.exe" for every source file twice. Ideally if the user is using a distributed compiler such as goma or distcc we would like invoke that instead... however, there's no access to $(CC) in gyp as far as I can tell.

I'm hopeful the experts here can show me a better way to approach the problem. evanm mentioned that the NaCl team is facing similar issues on Windows. Any suggestions would be greatly appreciated. Thanks in advance!

- dale

David Turner

unread,
Oct 3, 2012, 9:22:06 AM10/3/12
to Dale Curtis, gyp-de...@googlegroups.com, Ronald Bultje
Hello Dale,

Are you really going to work actively on the ffmpeg sources, or just use them as-is from a small number of known public releases?
What's the benefit of running the preprocessor on each Chrome build?
Do you lose any optimization opportunities by converting to C89?

If the answer to the last question is "no", what about simply using the once-converted c89 version of the ffmpeg sources in your gyp build?

In other words, do the conversion only once when you need to prepare a new set of ffmpeg sources.




- dale

--
 
 
 

Mark Mentovai

unread,
Oct 3, 2012, 10:37:02 AM10/3/12
to Dale Curtis, gyp-developer, Ronald Bultje
Can you make the extension of the source be something other than .c?



- dale

--
 
 
 

Dale Curtis

unread,
Oct 3, 2012, 1:21:44 PM10/3/12
to Mark Mentovai, gyp-developer, Ronald Bultje
Both ideas with their own headaches unfortunately.

Checking in the preprocessed code:
- Keeps us in the same situation we have today where if anyone makes a change, someone with a windows box has to go upload new binaries. Where binaries are now new preprocessed sources.
- Requires multiple copies of large preprocessed source files. Since the preprocessing collapses header includes and we need to build Chrome and Chromium, we need two copies of the preprocessed sources.

Changing the extension of the sources. Can't just rename them since we need this to compile on Linux/Mac where the preprocessor isn't required. Can't just use a "copies" section since it collapses paths and doesn't support renames. Still, there are a few ways this could be done, but none are nice:
- Keep a full extra copy of the renamed source code in the tree. Requires resyncing on changes.
- Have a copy action which generates a renamed copy of the entire source tree (need header files, etc which aren't explicitly listed) in the intermediate directory. Not as bad as a preprocess action, but still has the same drawbacks.
- Have a copy action which generates renamed copies of only files which need preprocessing within the source tree. Not horrible w/ a .gitignore, but still has the same touch one recopy all action behavior.

In the second two copy cases the ugly factor is high since we need to list original sources, copy source names, and output source names. All of which are the original list with a different prefix on each entry.

One thing I thought of last night was a GYP change to add a new flag for a rule section similar to "process_outputs_as_sources" which is "remove_input_on_rule_match". Which does what's on the label: when a rule is matched it removes the input from the original sources list. Alternatively supporting "sources!" inside the rule section would also work. Disclaimer: I have no idea if either is possible with the GYP model.

- dale

Mark Mentovai

unread,
Oct 3, 2012, 1:59:59 PM10/3/12
to Dale Curtis, gyp-developer, Ronald Bultje
You can change the extension of the sources without hurting Mac and Linux. You’d keep the rule to transform .whatever to .c on all platforms, but you’d have the rule effectively perform a file copy on platforms where no transformation is necessary.

Dale Curtis

unread,
Oct 3, 2012, 2:04:27 PM10/3/12
to Mark Mentovai, gyp-developer, Ronald Bultje
Unfortunately, that will also lead to a whole host of upstream vs downstream management conflicts. E.g., merges and diffs vs upstream will be problematic.

- dale

Mark Mentovai

unread,
Oct 3, 2012, 2:39:33 PM10/3/12
to Dale Curtis, Bradley Nelson, gyp-developer, Ronald Bultje
I don’t know how well MSVC would handle it if you had a target that the real ffmpeg target depended on, whose job it is to perform this translation, but using .c as the source and destination extension. Brad might know how MSVC would tolerate that. Given your constraints, it seems like that might be the best option.

It’d be better if your preprocessor worked on a single .c or .h at a time so you wouldn’t have to deal with messy dependency problems that this might cause.

Dale Curtis

unread,
Oct 3, 2012, 3:32:56 PM10/3/12
to Mark Mentovai, Bradley Nelson, gyp-developer, Ronald Bultje
The preprocessor does work on a single .c at a time. It's syntax is essentially "c99wrap.exe cl.exe <compile command line> <in> <out>".

Can you elaborate on your suggestion a bit? I'm not quite clear.

Also, will a target with a none type and a sources list still run a compilation step? If not having the target preprocess to an intermediate dir and then having the main target take in the intermediate dir sources would be awesome.

- dale

Mark Mentovai

unread,
Oct 3, 2012, 3:40:22 PM10/3/12
to Dale Curtis, Bradley Nelson, gyp-developer, Ronald Bultje
“c99wrap cl”—so c99wrap is running cl? Does it just read the .c, or does it also read any headers that the .c #includes?

You wouldn’t call the list of files to operate on in the none-type target “sources”, you’d probably stick it in some variable. You’d probably need a wrapper script to drive your c99wrap the way it wants to be driven. The wrapper would probably take the list of files to operate on as arguments.

Mark Mentovai

unread,
Oct 3, 2012, 4:08:51 PM10/3/12
to Ronald Bultje, Dale Curtis, Bradley Nelson, gyp-developer
OK, so c99conv operates on preprocessed source. The best we can do is to have something that regenerates all of the .c files any time any .c or .h file changes. To make this work, you’d want a target like this:

{
  'target_name': 'ffmpeg_c89_convert',
  'type': 'none',
  'actions': [
    {
      'action_name': 'ffmpeg_c89_convert',
      'inputs': [
        'file1.c',
        'file2.c',
        # ...
        'file9.h',  # list all headers too
      ],
      'outputs': [
        '<(SHARED_INTERMEDIATE_DIR)/ffmpeg_c89_convert/file1.c',
        '<(SHARED_INTERMEDIATE_DIR)/ffmpeg_c89_convert/file2.c',
        # ...
        # don't list any headers
      ],
      'action': [
        'python',
        'ffmpeg_c89_convert.py',
        '<(SHARED_INTERMEDIATE_DIR)/ffmpeg_c89_convert',
        '<@(_inputs)',
      ],
    },
  ],
},
{
  'target_name': 'ffmpeg',
  'type': 'static_library',
  'dependencies': [
    'ffmpeg_c89_convert',
  ],
  'sources': [
    '<(SHARED_INTERMEDIATE_DIR)/ffmpeg_c89_convert/file1.c',
    '<(SHARED_INTERMEDIATE_DIR)/ffmpeg_c89_convert/file2.c',
    # ...
  ],
}

ffmpeg_c89_convert.py is a script invoked as ffmpeg_c89_convert.py <output_dir> <input_files> that performs a loop over all input_files; for each one ending in .c, it sets output_file to <output_dir>/<name>, and runs cl -P <input_file> -Fi<temp_file> followed by c99conv <temp_file> <output_file>.

The ffmpeg static library target depends on the ffmpeg_c89_convert target, so all of the files that it wants to compile (in SHARED_INTERMEDIATE_DIR) are produced by the ffmpeg_c89_convert target running the ffmpeg_c89_convert.py script.

The ffmpeg_c89_convert target lists all of the source files in ffmpeg as inputs, and all of the .c files that ffmpeg_c89_convert.py produces as outputs. If any output is missing, or if any input is newer than any output, the entire ffmpeg_c89_convert.py script runs again. It’s important to list the headers as inputs to ffmpeg_c89_convert in addition to the .c files, because any header change might affect the preprocessed output.

You take c99wrap out of the equation, so that you can deal with the “create source compileable by cl” step (cl -P and c99conv) distinctly from the “compile source with cl” step, allowing the latter to be handled as natively (and normally) as possible.

All of this can be simplified slightly by using GYP variables so that you only need to list file1.c, file2.c, and file9.h in one location in the file.


On Wed, Oct 3, 2012 at 3:53 PM, Ronald Bultje <rbu...@google.com> wrote:
Hi Mark,

c99wrap.exe, when compiling a source file into an object file,
internally does the following:

A) cl.exe -P [rest of input commandline syntax] in.c -Fitempfile1.c
B) c99conv.exe tempfile1.c tempfile2.c
C) cl.exe -c [rest of input commandline syntax] tempfile2.c -Foout.o

tempfile1.c is the preprocessed file (containing c99 syntax) where all
macros and includes are resolved.
tempfile2.c is the C89 version of the C99 file tempfile2.c,
c99conv.exe is the tool that does this conversion.
(for those not familiar with cl.exe commandline syntax: -Fi<name> is
the output filename for the preprocessor; -Fo<name> is the output
filename for the object file generation.)

Ideally, we'd like both step (A) as well as step (C) to happen
remotely when using goma.

Ronald

Dale Curtis

unread,
Oct 3, 2012, 5:09:54 PM10/3/12
to Mark Mentovai, Ronald Bultje, Bradley Nelson, gyp-developer
What you're describing, is what I was trying to describe as my "python based action" in the original email and is kind of crufty due to the reasons mentioned there.

As it turns out setting 'type' to 'none' w/ a sources list and a rule works splendidly. I have one target which includes the original .c files and outputs the converted c files in the intermediate directory. Another target depends on this target and runs the actual compilation step. This allows us to get all the parallelism and incremental build advantages of the GYP!

- dale

Mark Mentovai

unread,
Oct 3, 2012, 5:16:28 PM10/3/12
to Dale Curtis, Ronald Bultje, Bradley Nelson, gyp-developer
Implemented as you’re now describing, your .c files won’t be rebuilt when you touch an .h file that they depend on. That seems pretty bad to me.

Dale Curtis

unread,
Oct 3, 2012, 5:18:32 PM10/3/12
to Mark Mentovai, Ronald Bultje, Bradley Nelson, gyp-developer
That can be fixed by including the '.h' files in the sources list right?

- dale

Mark Mentovai

unread,
Oct 3, 2012, 5:25:12 PM10/3/12
to Dale Curtis, Ronald Bultje, Bradley Nelson, gyp-developer
No, but you might be able to fix it by including all '.h' files in an 'inputs' list in the rule definition. (You’d also want to list your tool in inputs, so that when your tool changes, it gets a chance to run again).

“sources in a none-type target” might break the linter. Make sure you try this by running gyp with --check.

Dale Curtis

unread,
Oct 3, 2012, 5:38:35 PM10/3/12
to Mark Mentovai, Ronald Bultje, Bradley Nelson, gyp-developer
Gotcha, headers are rarely modified so sticking them in the 'inputs' list such that we only hit the worst case behavior (reconvert, recompile everything) on a header change is okay.

gyp_chromium --check passes without error.

- dale

Mark Mentovai

unread,
Oct 3, 2012, 5:40:48 PM10/3/12
to Dale Curtis, Ronald Bultje, Bradley Nelson, gyp-developer
Great! Sounds like you’re set, then.

Evan Martin

unread,
Oct 3, 2012, 6:00:16 PM10/3/12
to Mark Mentovai, Dale Curtis, Ronald Bultje, Bradley Nelson, gyp-developer
Out of curiosity, couldn't you use "process_outputs_as_sources" on the action and avoid the intermediate target?


--
 
 
 

Mark Mentovai

unread,
Oct 3, 2012, 6:07:26 PM10/3/12
to Evan Martin, Dale Curtis, Ronald Bultje, Bradley Nelson, gyp-developer
Requiring that the inputs to the rule have extension .c prevents the target type from being a real compiles-files type. none-type targets don’t compile anything.

Bradley Nelson

unread,
Oct 4, 2012, 8:41:41 PM10/4/12
to Mark Mentovai, Evan Martin, Dale Curtis, Ronald Bultje, gyp-developer
This squared away?, sorry been out sick...

Dale Curtis

unread,
Oct 4, 2012, 8:44:58 PM10/4/12
to Bradley Nelson, Mark Mentovai, Evan Martin, Ronald Bultje, gyp-developer
Yeah, I've got everything mostly working now. The trick was the 'none' type section with a rule for files with a 'c' extension.

I'm still looking for better ways to detect goma though. Right now I'm popping up the directory stack looking for the chromium.gyp_cmd file and extracting CC from it. While hacky, GOMA cuts the build time from 3minutes to 30 seconds, so it seems worth it.

- dale
Reply all
Reply to author
Forward
0 new messages