Plans to support more fine grained SIMD instructions

87 views
Skip to first unread message

Mostafa Mokhtar

unread,
Apr 8, 2014, 3:28:51 AM4/8/14
to yeppp-...@googlegroups.com
In C++ I have been using this Vector library alot http://www.agner.org/optimize/vectorclass.pdf and I really like the set of intrinsics exposed.
Does Yepp has plans to support such data types?
If not was is the process to them myself for personal use or ? 

Or Yepp classes are implemented the way they are due to overhead of calls into JNI?

Thanks
Mostafa

Marat Dukhan

unread,
Apr 8, 2014, 2:03:34 PM4/8/14
to Mostafa Mokhtar, yeppp-...@googlegroups.com
Hi Mostafa,

There are several reasons why Yeppp! functions have "long vector" interface:
  • Performance. Long vectors Yeppp! can hide latency of individual instructions through loop unrolling and software pipelining.
  • Portability. Long vector operations have the same semantics regardless of the available instruction sets. E.g. a floating-point vector addition function would use AVX on recent x86-64 processors, SSE on older processors, NEON on capable ARM processors, or even software floating-point emulation on old ARM cores, but the function prototype is the same on all platforms.
  • Compatibility with languages other than C/C++. Some languages, such as Java and Rust, do not support SIMD data types, others, like C#, D, and Julia might support only a limited subset of SIMD extensions, or use different calling conventions than C/C++. However, nearly all programming languages support some mechanism for calling C libraries with array arguments.
There are no plans for "short-vector" interface in Yeppp!, except short-vector elementary functions for compiler auto-vectorization (which probably won't be exposed to users).

Regards,
Marat


--
You received this message because you are subscribed to the Google Groups "yeppp-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to yeppp-discus...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mostafa Mokhtar

unread,
Apr 8, 2014, 2:17:09 PM4/8/14
to Marat Dukhan, yeppp-...@googlegroups.com
Hi Marat,

Yes, what you say definitely makes sense.
How can I extend the current methods in Core to leverage other SIMD instructions for Long Vectors?
The current functions in Core have good but not enough coverage.

Marat Dukhan

unread,
Apr 9, 2014, 5:39:02 PM4/9/14
to Mostafa Mokhtar, yeppp-...@googlegroups.com
Yeppp! is mostly code-generated from Python scripts to enable code reuse.

Functions in the Core module are generated by /codegen/core.py script.
E.g. look at the function generate_min, which generates a group of yepCore_Min_* function:

First, it defines a function group using Python 'with' construct:
    with yeppp.module.Function(module, 'Min', 'Minimum') as function:

Min is group name (used as file name for some of the generated files, and for grouping in documentation),
Minimum is description (used for documentation generation).

Inside the with construct it sets required properties and calls code-generation:
function.java_documentation = """
@brief	Computes the minimum of %(InputType0)s array elements.
"""
function.c_documentation = """
@brief	Computes the minimum of %(InputType0)s array elements.
@param[in]	v	Pointer to the array of elements whose minimum will be computed.
@param[out]	minimum	Pointer to the variable where the minimum will be stored.
@param[in]	length	Length of the array specified by @a v. Must be non-zero.
"""
The above specifies the Doxygen-style documentation for the functions that will be generated (documentation is parametrised by their argument types).
c_documentation is used to generate docs for C, FORTRAN, and C# bindings with pointer arugments (example).
java_documentation is used to generate docs for Java and C# bindings with array arguments (example).

Next, you specify the list of optimized implementations for a function (they also can be generic, i.e. generate implementations for different types).
function.assembly_implementations = list()
For min reduction there are currently no optimized implementations, so the list is empty.

Next, you write a reference implementation of the function. Here is the implementation for min reduction.
function.c_implementation = """
Yep%(InputType0)s minimum = *vPointer++;
while (--length != 0) {
	const Yep%(InputType0)s v = *vPointer++;
	minimum = yepBuiltin_Min_%(InputType0)s%(InputType0)s_%(InputType0)s(v, minimum);
}
*minimumPointer++ = minimum;
return YepStatusOk;
"""

This specifies the unit test: it will call optimized implementations with array v of random uniform numbers and compare to the reference implementation.
function.unit_test = ReferenceUnitTest(v = Uniform())

Finally, generate the functions
function.generate("yepCore_Min_V8s_S8s(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V8u_S8u(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V16s_S16s(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V16u_S16u(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V32s_S32s(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V32u_S32u(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V64s_S64s(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V64u_S64u(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V32f_S32f(v, minimum, YepSize length: length != 0)")
function.generate("yepCore_Min_V64f_S64f(v, minimum, YepSize length: length != 0)")

Each call to function.generate will generate code for:
  • Function declaration in the C header (/library/headers/yepCore.h)
  • Default implementation (/library/sources/core/yepMin.impl.cpp)
  • Optimized implementations, if any (/library/sources/core/yepMin.*.asm)
  • Methods in Java class (/bindings/java/sources-java/info/yeppp/Core.java)
  • C implementation of the JNI method (/bindings/java/sources-jni/core/Min.c)
  • Declaration and implementation of the C# method (/bindings/clr/sources-csharp/core/Min.cs)
  • Declaration of the method in a FORTRAN module (/bindings/fortran/source/yepCore.f90)
  • Unit test (/unit-tests/sources/core/Min.cpp)
The type parameters are deduced from function name: yepCore_Min_<InputTypes>_<OutputTypes>.
For the function above <InputTypes>=V8s (Vector/Array of 8-bit signed integers), <OutputTypes>=S8s (scalar of 8-bit signed integers).
Other types (length in this example) must be specified explicitly (see YepSize length above).
length != 0 means that the value of 0 is not acceptable for the length argument (the code-generator will generate a check for this value to return YepStatusInvalidArgument in this case).
If there is an argument named "length" in a function, the Yeppp! code-generator assumed that it denotes the length of all arrays in the function. If some arrays have length specified by another argument, it should be specified explicitly with array_arg[length_arg] notation, e.g. see EvalutatePolynomial function from Math module:
function.generate("yepMath_EvaluatePolynomial_V64fV64f_V64f(coef[coefCount], x, y, YepSize coefCount: coefCount != 0, YepSize length)")

That's the minimum needed to add functions to Yeppp! Providing optimized implementations is a different story.
After you implement a generate_* function and add a call to it in __main__, you may re-generate Core module with the command
python codegen/core.py (Yeppp! root should be the current directory).

Please note that Yeppp! code-generator and build system are being refactored, so the interface will change in the future.

Regards,
Marat
Reply all
Reply to author
Forward
0 new messages