Using objects across language boundaries
Contents
Using objects across language boundaries¶
Using objects across language boundaries
Exploring the limits of language interoperability and consistent API design
Motivation¶
scientific libraries are not a closed ecosystem
need for (easy) interfacing with other projects and robust dependency management
more flexibility when interfacing via APIs rather than IO
in-process communication rather than managing microservices
we already have enough incompatible / undocumented ASCII file formats for data exchange
Approches to interoperability¶
foreign function interface (libffi)
used in CPython with ctypes in standard library
source generation from C headers
direct compatibility to C
Important
There is no single solution for interoperability.
Language interoperability in Fortran¶
bind(c) with iso_c_binding for definition in Fortran
allows manipulation of C data types in Fortran
Fortran language constructs are easily accessible (e.g., using modules, pointer, class(*), etc.)
ISO_Fortran_binding.h for definition in C
allows manipulation of Fortran data types in C
provides a stable API for C, but does not guarantee ABI compatibility
typedef struct
{
CFI_index_t extent;
CFI_index_t sm;
CFI_index_t lower_bound;
} CFI_dim_t;
typedef struct
{
CFI_index_t lower_bound;
CFI_index_t extent;
CFI_index_t sm;
} CFI_dim_t;
we can only see symbols and pointers from other languages
generic interfaces, overloaded operators, … are not available (compile-time dispatch requires a compiler)
API bindings are not necessarily part of the upstream project (relying on semantic versioning conventions)
calling conventions different between platforms (cdecl vs. stdcall)
Note
Compilers can help exporting the C header:
gfortran -fc-prototypes -fsyntax-only foopss_api.f90 > foopss.h
Either as build system step (not all compilers support this) or to generate a template.
Object-oriented patterns for stable APIs¶
adding new features does not break function signatures
individual procedures can be simpler
nlopt_result
nlopt_minimize(
nlopt_algorithm algorithm,
int n,
nlopt_func f,
void* f_data,
const double* lb,
const double* ub,
double* x,
double* minf,
double minf_max,
double ftol_rel,
double ftol_abs,
double xtol_rel,
const double* xtol_abs,
int maxeval,
double maxtime
);
typedef struct nlopt_opt_s* nlopt_opt;
nlopt_opt
nlopt_create(nlopt_algorithm algorithm,
unsigned n);
nlopt_result
nlopt_set_min_objective(nlopt_opt opt,
nlopt_func f,
void* f_data);
nlopt_result
nlopt_optimize(nlopt_opt opt,
double* x,
double* opt_f);
void
nlopt_destroy(nlopt_opt opt);
ABI compatibility nightmares¶
interoperability cares about the application binary interface (ABI)
compatibility of application programmable interface (API) usually not enough
On the Fortran side¶
object-oriented APIs can leak symbols into dependent libraries
transient dependencies become full dependencies
no ABI conventions, easy to break ABI with API-compatible changes
changes in class definition result in ABI breakage
renaming functions in generic interfaces breaks ABI
adding (optional) arguments to procedures is breaking the ABI
so does changing the compiler
…
vigilance required when linking dynamically
very tight version pinning in package managers
large rebuild cascades in case of (ABI) incompatible version changes
failures happen when compiling, linking or loading incompatible versions
On the C side¶
most stable ABI available to build on
ABI issues can be hidden behind opaque pointers
objects must not be exchanged between libraries
each library must work only on its own objects
do not blindly cast typdef-ed pointers (generally a bad idea)
potential data redundancy and needless copying of the data is needed (if objects own their data)
(external) memory management is an open question for Fortran
allocate
/deallocate
statements can not be substituted from Ccould use
pointer
for using chunks of externally managed memory (different design required)
In C¶
Emulating objects in C¶
objects are represented as opaque pointers (
void *
)typedef-ed pointers for better readability and some type safety
C11
_Generic
can be used to dispatch at compile time
What makes a classy object¶
Fortran standard does not specify implementation of class objects
simple design model are C structs with overlapping fields and procedure pointers
type :: a_cls
character(len=:), allocatable :: label
integer :: ndim = 0
real(wp), allocatable :: bounds(:)
contains
procedure :: set_bounds
procedure :: get_bounds
end type
type, extends(a_cls) :: b_cls
! character(len=:), allocatable :: label
! integer :: ndim = 0
! real(wp), allocatable :: bounds(:)
! procedure, pointer :: set_bounds
! procedure, pointer :: get_bounds
real(wp) :: scale = 1.0_wp
contains
procedure :: set_scale
end type
struct a_cls {
char* label;
int32_t ndim;
double* bounds;
void (* set_bounds)(struct a_cls*, const double*);
void (* get_bounds)(struct a_cls*, double*);
};
struct b_cls {
char* label;
int32_t ndim;
double* bounds;
void (* set_bounds)(struct a_cls*, const double*);
void (* get_bounds)(struct a_cls*, double*);
double scale;
void (* set_scale)(struct b_cls*, double);
};
C-ish way of polymorphism builds on memory layout and casting pointers
procedure pointers provide a way to dispatch at runtime
casting around pointers probably worse than
select type
in Fortran
Important
Objects need not be transparent to be usable
C compatible wrapping¶
class polymorphic cannot be pointed to by a C-pointer
use thin wrapper type around Fortran objects
metadata can be attached to the wrapper
can store class polymorphic objects and preserve their state
allocation status of components is available
module foopss_api
use, intrinsic :: iso_c_binding, only : c_ptr, c_loc, &
& c_f_pointer, c_associated, c_null_ptr
use foopss_structure, only : structure_type
implicit none
type :: vp_structure
type(structure_type) :: raw
end type vp_structure
contains
function foopss_new_structure() result(new) bind(c)
type(c_ptr) :: new
type(vp_structure), pointer :: struc
allocate(struc)
new = c_loc(struc)
end function foopss_new_structure
subroutine foopss_delete_structure(ptr) bind(c)
type(c_ptr), intent(inout) :: ptr
type(vp_structure), pointer :: struc
if (c_associated(ptr)) then
call c_f_pointer(ptr, struc)
deallocate(struc)
ptr = c_null_ptr ! ensure repeated calls to delete become no-ops
end if
end subroutine foopss_delete_structure
end module foopss_api
deconstructor can either take the pointer by value or by reference
by value: C side has to invalidate the (dangling) pointer by assigning
NULL
by reference: Fortran side can nullify C pointer
typedef struct foopss_structure* foopss_structure;
extern foopss_structure
foopss_new_structure(void);
extern void
foopss_delete_structure(foopss_structure* /* struc */);
Bad practice C¶
nobody stops our users from accidentally passing the wrong objects (there will be warnings)
app/main.c:16:29: warning: initialization of ‘foopss_structure’ {aka ‘struct foopss_structure *’} from incompatible pointer type ‘foopss_context’ {aka ‘struct foopss_context *’} [-Wincompatible-pointer-types]
16 | foopss_structure struc = ctx;
| ^~~
app/main.c:17:28: warning: passing argument 1 of ‘foopss_delete_structure’ from incompatible pointer type [-Wincompatible-pointer-types]
17 | foopss_delete_structure(&ctx);
| ^~~~
| |
| struct foopss_context **
In file included from app/main.c:4:
./include/foopss.h:28:25: note: expected ‘struct foopss_structure **’ but argument is of type ‘struct foopss_context **’
28 | foopss_delete_structure(foopss_structure*);
| ^~~~~~~~~~~~~~~~~
even worse: the binding language to C cannot distinguish between different typdef-ed objects (e.g. Python’s ctypes)
Compile time dispatch in C¶
_Generic
allows us to dispatch typedef-ed pointers at compile timeusually defined as a type generic macro
difficult to use outside of C
_Generic((obj),
foopss_context: foopss_delete_context,
foopss_structure: foopss_delete_structure,
foopss_calculator: foopss_delete_calculator,
foopss_result: foopss_delete_result,
)(&obj)
Handling errors¶
well-behaving libraries must never stop under any circumstances
errors should be propagated up, the caller must be able to handle them gracefully
runtime must not be left in an undefined state (do not throw exceptions through foreign languages)
rich error reporting is preferable over plain error codes
can include error message and context more easily
implemented via opaque pointer or C-compatible struct
API for both querying (and creating) error context
/// Error instance
typedef struct foopss_error* foopss_error;
/// Create new error handle
extern foopss_error
foopss_new_error(void);
/// Delete an error handle
extern void
foopss_delete_error(foopss_error* /* error */);
/// Check error handle status
extern int32_t
foopss_check_error(foopss_error /* error */);
/// Get error message from error handle
extern void
foopss_get_error(foopss_error /* error */,
char* /* buffer */,
const int32_t* /* buffersize */);
Callback mechanism¶
IO features should be implemented in native language (logging, error reporting, …)
hooks deep inside algorithm need to communicate or exchange data
module foopss_api
use, intrinsic :: iso_c_binding
use foopss_context_type, only : context_type, context_logger
implicit none
!> Void pointer to manage calculation context
type :: vp_context
!> Actual payload
type(context_type) :: raw
end type vp_context
abstract interface
!> Interface for callbacks used in custom logger
subroutine callback(msg, len, udata)
import :: c_char, c_ptr, c_int32_t
!> Message payload to be displayed
character(kind=c_char) :: msg(*)
!> Length of the message
integer(c_int32_t) :: len
!> Data pointer for callback
type(c_ptr), value :: udata
end subroutine callback
end interface
!> Custom logger for calculation context to display messages
type, extends(context_logger) :: callback_logger
!> Data pointer for callback
type(c_ptr) :: udata = c_null_ptr
!> Custom callback function to display messages
procedure(callback), pointer, nopass :: callback => null()
contains
!> Entry point for context instance to log message
procedure :: message
end type callback_logger
contains
! ...
!> Create a new custom logger for the calculation context
subroutine foopss_set_context_logger(vctx, vproc, vdata) bind(c)
type(c_ptr), value :: vctx
type(vp_context), pointer :: ctx
type(c_funptr), value :: vproc
procedure(callback), pointer :: fptr
type(c_ptr), value :: vdata
if (.not.c_associated(vctx)) return
call c_f_pointer(vctx, ctx)
if (allocated(ctx%raw%io)) deallocate(ctx%raw%io)
if (c_associated(vproc)) then
call c_f_procpointer(vproc, fptr)
ctx%raw%io = new_callback_logger(fptr, vdata)
end if
end subroutine set_context_logger_api
!> Create a new custom logger
function new_callback_logger(fptr, udata) result(self)
!> Actual function used to display messages
procedure(callback) :: fptr
!> Data pointer used inside the callback function
type(c_ptr), intent(in) :: udata
!> New instance of the custom logger
type(callback_logger) :: self
self%callback => fptr
self%udata = udata
end function new_callback_logger
!> Entry point for context type logger, transfers message from context to callback
subroutine message(self, msg)
!> Instance of the custom logger with the actual logger callback function
class(callback_logger), intent(inout) :: self
!> Message payload from the calculation context
character(len=*), intent(in) :: msg
character(kind=c_char) :: charptr(len(msg))
charptr = transfer(msg, charptr)
call self%callback(charptr, size(charptr, kind=c_int32_t), self%udata)
end subroutine message
end module foopss_api
cannot directly use C procedure pointer due to conversion from Fortran to C string
call back needs to be wrapped twice in Fortran for robustness
string need not be
NUL
terminated when passing length as an argument
/// Context manager for the library usage
typedef struct foopss_context* foopss_context;
/// Define callback function for use in custom logger
typedef void (*foopss_logger_callback)(char*, int32_t, void*);
// ...
/// Set custom logger function
void
foopss_set_context_logger(foopss_context /* ctx */,
foopss_logger_callback /* callback */,
void* /* userdata */);
// ...
/// Example for a callback function
void
example_callback(char* msg, int32_t len, void* udata) {
/* print len chars from msg, since it need not NUL terminated */
printf("[callback] %.*s\n", len, msg);
}
can use callback mechanism to implement new classes via API
user data can be carried around via opaque pointer
Practical tips¶
C macros can help a lot with reducing repetitive boilerplate in the C header
#ifdef __cplusplus
# define FOOPSS_API_ENTRY extern "C"
#else
# define FOOPSS_API_ENTRY extern
#endif
/* for attributes or DLL export stuff on Windows */
#define FOOPSS_API_CALL
/* Labels to keep track of version introducing API */
#define FOOPSS_API__V_1_0
#define FOOPSS_API__V_1_0_DEPRECATED
#define FOOPSS_API__V_1_1
#define FOOPSS_API__V_1_2
// ...
/// Context manager for the library usage
typedef struct foopss_context* foopss_context;
/// Define callback function for use in custom logger
typedef void (*foopss_logger_callback)(char*, void*) FOOPSS_API__V_1_2;
/// Create new calculation environment object
FOOPSS_API_ENTRY foopss_context FOOPSS_API_CALL
foopss_new_context(void) FOOPSS_API__V_1_0;
/// Delete calculation context
FOOPSS_API_ENTRY void FOOPSS_API_CALL
foopss_delete_context(foopss_context* /* ctx */) FOOPSS_API__V_1_0;
/// Set custom logger function
FOOPSS_API_ENTRY void FOOPSS_API_CALL
foopss_set_context_logger(foopss_context /* ctx */,
foopss_logger_callback /* callback */,
void* /* userdata */) FOOPSS_API__V_1_2;
users can statically check whether an API feature is available by checking whether a macro is defined
In Python¶
Directly from Fortran to Python?¶
several automatic wrapper generators are available
numpy.f2py: is mainly targeting legacy Fortran code (objects are not supported)
f90wrap: is a wrapper generator for Fortran 90
gfort2py: GFortran module file based wrapper generator
many more abandoned projects and tools
low-level plumbing possible via Python header (Fortran bindings available with forpy)
more (stable) option to wrap C from Python than wrapping Fortran from Python
most projects need/want C API anyway
Python ctypes¶
ctypes: low-level Python interface to C
dynamic loading of shared libraries via dlopen
requires definition of signatures in Python (duplication)
library name is platform dependent and logic must be handled in Python
no type safety for opaque pointers (all declared as
c_void_p
)
no automatic deconstruction available for objects
manual deconstruction best wrapped by a context manager (
__enter__
and__exit__
hooks)
numpy.ctypeslib provides convenience functions
numpy objects pass through API without additional effort (
numpy.ctypeslib.as_array
)
no compiled dependencies in your Python package
no additional dependencies in Python
import ctypes as ct
import numpy as np
import os.path as op
# layout when installed in same $PREFIX is usually (not true on Windows)
# $PREFIX/lib/libfoopss.*
# $PREFIX/lib/python3.x/site-packages/foopss.py
_foopss = np.ctypeslib.load_library('libfoopss', op.dirname(__file__).parent.parent)
# Can only declare opaque handle, but not typedef it
_foopss.foopss_new_context.restype = ct.c_void_p
# For ctypes passing the void* by value might be preferable
_foopss.foopss_delete_context.argtypes = [ct.POINTER(ct.c_void_p)]
# Prototype for callback function, pass-through PyObject
logger_callback = ct.CFUNCTYPE(None, ct.c_char_p, ct.c_int32, ct.py_object)
# Some type safety for callback signatures, user-data
_foopss.foopss_set_context_logger.argtypes = [ct.c_void_p, logger_callback, ct.py_object]
_foopss.foopss_set_context_logger.restype = None
# ...
Foreign function interface¶
cffi module allows different access modes
in-line ABI level mode similar to Python ctypes (but allows typdefs)
out-of-line API mode most useful for creating standalone extension modules
runtime requirement on cffi module
C parser is very limited, manual preprocessing of headers required
preprocessed header can produce complete extension module source code
low effort solution for creating the actual glue code and defining the extension module
import os
import subprocess
import cffi
library = "foopss"
include_header = f'#include "{library}.h"'
prefix_var = library.upper() + "_PREFIX"
if prefix_var not in os.environ:
prefix_var = "CONDA_PREFIX"
module_name = f"{library}._lib{library}"
# Detection logic must provide:
# - dict with libraries to link as well as include, library and runtime directories
# - additional flags for C-compiler to find headers
kwargs = dict(libraries=[library])
cflags = []
if prefix_var in os.environ:
prefix = os.environ[prefix_var]
kwargs.update(
include_dirs=[os.path.join(prefix, "include")],
library_dirs=[os.path.join(prefix, "lib")],
runtime_library_dirs=[os.path.join(prefix, "lib")],
)
cflags.append("-I" + os.path.join(prefix, "include"))
# pycparser cannot handle preprocessor gracefully
# - manual preprocessing required via C compiler or cpp required
cc = os.environ["CC"] if "CC" in os.environ else "cc"
p = subprocess.Popen(
[cc, *cflags, "-E", "-"],
stdin=subprocess.PIPE,
stdout=subprocess.PIPE,
stderr=subprocess.PIPE,
)
out, err = p.communicate(include_header.encode())
# Definitions of API from header
cdefs = out.decode()
# Final construction extension module source
# - distutils/setuptools handle this automatically
# - manual run has to write the C source
ffibuilder = cffi.FFI()
ffibuilder.set_source(module_name, include_header, **kwargs)
ffibuilder.cdef(cdefs)
if __name__ == "__main__":
ffibuilder.distutils_extension(".")
From errors to exceptions¶
ctypes or cffi wrapped C-libraries still look and feel like C when used in Python
create convenience wrappers for creating (and deleting) objects
cffi can setup a proper garbage collection routine for any object
use a decorator for transforming library errors to exception
import functools
from ._libfoopss import ffi, lib
def _delete_error(err):
"""Delete an error handle object"""
ptr = ffi.new("foopss_error *")
ptr[0] = err
lib.foopss_delete_error(ptr)
def new_error():
"""Create new error handle object"""
return ffi.gc(lib.foopss_new_error(), _delete_error)
def error_check(func):
"""Handle errors for library functions that require an error handle"""
@functools.wraps(func)
def handle_error(*args, **kwargs):
"""Run function and than compare context"""
_err = new_error()
value = func(_err, *args, **kwargs)
if lib.foopss_check_error(_err):
_message = ffi.new("char[]", 512)
lib.foopss_get_error(_err, _message, ffi.NULL)
raise RuntimeError(ffi.string(_message).decode())
return value
return handle_error
Marshalling errors in callbacks¶
exceptions must not be thrown to foreign runtimes
leaves the foreign runtime in an undefined state
probably will crash due to corrupted stack
callbacks must serialize their exceptions or errors
e.g. C++ callback in Fortran runtime used from Python
cannot assume that upstream can make sense of the exception
From a C-ish to a Pythonic API¶
thin wrappers still retain C-ish handling for objects
okay for procedural APIs
different expectations between object-oriented C and Python
another layer needed to make Pythonic classes and objects
import numpy as np
from . import library
class Structure:
"""
Represents a wrapped structure object in foopss.
Example
-------
>>> from foopss.interface import Structure
>>> import numpy as np
>>> mol = Structure(
... positions=np.array([
... [+0.00000000000000, +0.00000000000000, -0.73578586109551],
... [+1.44183152868459, +0.00000000000000, +0.36789293054775],
... [-1.44183152868459, +0.00000000000000, +0.36789293054775],
... ]),
... numbers = np.array([8, 1, 1]),
... )
...
"""
_mol = library.ffi.NULL
def __init__(self, numbers, positions, lattice, periodic):
_numbers = np.ascontiguousarray(numbers, dtype="i4")
_positions = np.ascontiguousarray(positions, dtype="float")
_lattice = np.ascontiguousarray(lattice, dtype="float")
_periodic = np.ascontiguousarray(periodic, dtype="bool")
self._mol = library.new_structure(
self._natoms,
_cast("int32_t*", _numbers)
_cast("double*", _positions),
_cast("double*", _lattice),
_cast("bool*", _periodic),
)
def _cast(ctype, array):
"""Cast a numpy array to a FFI pointer"""
return (
library.ffi.NULL
if array is None
else library.ffi.cast(ctype, array.ctypes.data)
)
Caveats¶
serialization of objects with C components is not possible by default
opaque handles cannot be deepcopied or pickled
Installing software at advanced difficulty¶
language-specific build systems (setuptools, fpm, …) have shortcomings for interlanguage bindings
cross-compilation for bindings or extensions usually poorly supported (if at all)
particularities of foreign language must be accounted for in the build system
distribution model of language might not be flexible enough (Python wheels, …)
start with general-purpose build system (meson, cmake, …)
clean support for handling multiple languages
possibility to split build into multiple projects (useful for distributions)
Summary¶
What can we do?¶
opaque pointers and callbacks allow emulating Fortran classes
Fortran compiler can help exporting C headers directly from Fortran source code
wrapper generators can consume C headers to produce language-specific bindings
build systems already encode the required knowledge to glue languages together
keep compatible versions together via package managers
What would we like?¶
mechanism to dispatch or recover type information of opaque pointers
runtime type checking, e.g. recover mixed up arguments
dispatching in generic routines, e.g. deconstruction
convenience features for generating C from Fortran
declare typedefs in Fortran for opaque pointers
better encoding support for handling strings
export native types from Fortran like CFI descriptors in an ABI compatible way
more transparent memory management by exposing CFI allocator / deallocator
Further reading¶
Crafting Interpreters by Robert Nystrom
Making Libraries Consumable for Non-C++ Developers by Aaron R. Robinson