1. Introduction
In his groundbreaking work [
1,
2], particularly Chapters 2-5 of [
2], Anastassiou first established quantitative approximation rates for neural networks approximating continuous functions. He achieved this through specialized Cardaliaguet-Euvrard and "squashing" operators, deriving convergence rates using the modulus of continuity of target functions (and their higher-order derivatives) while proving sharp Jackson-type inequalities. Both univariate and multivariate cases received rigorous treatment, with the defining kernels of these operators - "bell-shaped" and "squashing" - assumed to have compact support for strong localization and stability.
Building upon this foundation and inspired by Chen and Cao’s work [
5], the author extended this research by introducing and analyzing quasi-interpolation operators activated by sigmoidal and hyperbolic tangent functions. This culminated in a comprehensive treatment of univariate, multivariate, and fractional cases [
3,
4], establishing a robust framework for studying new families of neural network operators with enhanced convergence and stability properties.
This paper advances this framework by developing a class of multivariate symmetrized and perturbed hyperbolic tangent-activated neural network operators and establishing their Voronovskaya-type asymptotic expansions for differentiable mappings , where . The proposed symmetrization, combined with parametric deformation of the hyperbolic tangent function, enhances both the approximation capability and regularity of the resulting operators. This yields a more precise asymptotic description of their behavior as network size increases, revealing new structural properties relevant to high-dimensional approximation problems.
For recent related developments in neural network approximation theory, see [
8,
9] and references therein. The classical works [
6,
7] remain fundamental for a comprehensive introduction to neural networks and their architectures.
We now formalize the multilayer feed-forward structure considered in this study. Let denote the number of hidden layers. For an input vector with , we define weight vectors , coefficient vectors , and biases for , where represents the number of neurons per layer.
Using
to denote the Euclidean inner product, the activation at node
j is given by
. The network output at the first layer is then:
Higher-level compositions can be defined recursively. For example, the second-level composition is:
More generally, for any
, we define:
This recursive structure captures the essence of multilayer feed-forward networks. The specific choice of activation function, along with the weight and bias distributions, determines the approximation properties analyzed in subsequent sections.
2. Mathematical Formulations
Following the framework established in [
4], we define the
perturbed hyperbolic tangent activation function as:
Here,
serves as a scaling parameter, while
q acts as a deformation coefficient. For a comprehensive discussion, see Chapter 18 of [
4], titled
“q-Deformed and λ-Parameterized Hyperbolic Tangent-Based Banach Space-Valued Neural Network Approximation”.
Symmetrization Method
We implement a half-data feed strategy for our multivariate neural networks by defining the following density-type kernel:
This kernel satisfies
for all
and exhibits the following symmetry relations:
By summing these expressions, we obtain:
This allows us to define the even (symmetric) function:
2.1. Key Properties and Extremal Values
The analysis of the kernel functions
and
reveals several fundamental properties that are crucial for understanding their behavior and applications in neural network approximation theory. Most notably, these functions attain their global maximum values at symmetric points, as established in [
4].
Theorem 1
(Extremal Values of Kernel Functions).
For the deformation parameter and scaling parameter , the kernel functions and satisfy the following extremal property:
This result demonstrates that:
- 1.
The functions and are symmetric with respect to the origin when .
- 2.
For , the functions attain their maximum values at points that are symmetric about the origin, specifically at and respectively.
- 3.
The maximum value is independent of the deformation parameter q and depends only on the scaling parameter λ.
This symmetry and extremal property play a fundamental role in the construction of the symmetric kernel and in establishing the approximation properties of the resulting neural network operators.
Remark 1.
The value represents the peak amplitude of both kernel functions. As λ increases, this maximum value approaches , which corresponds to the limiting case where the hyperbolic tangent function approaches a step function. This observation connects our deformed kernels to the classical sigmoidal activation functions used in neural networks.
Proof. The proof follows directly from the definition of
in (
5) and the properties of the deformed hyperbolic tangent function
defined in (
4).
A similar calculation shows that , completing the proof. □
2.2. Partition of Unity Property
A fundamental property of the kernel functions and is their ability to form a partition of unity, which plays a crucial role in approximation theory and the construction of neural network operators. This property is formally established in the following theorem:
Theorem 2
(Partition of Unity for Deformed Hyperbolic Tangent Kernels).
For fixed parameters and , the kernel functions and satisfy the partition of unity property:
As a direct consequence, the symmetrized kernel defined in (8) also satisfies:
Proof. The proof follows from the specific construction of the kernels and their properties:
- 1.
The function
defined in (
4) is a deformed hyperbolic tangent function that approaches 1 as
and -1 as
.
- 2.
The kernel
is constructed as a difference of shifted versions of
, specifically:
- 3.
As , exponentially fast due to the properties of .
- 4.
The sum forms a telescoping series that converges to 1 for all , due to the specific construction and the behavior of at infinity.
- 5.
The same argument applies to
due to the symmetry relation established in (
6).
- 6.
The result for follows directly from its definition as the average of and .
For a complete rigorous proof, see [
4]. □
Corollary 1
(Multivariate Partition of Unity).
The multivariate kernel defined in () satisfies the following partition of unity property in :
where and .
Proof. This follows directly from the univariate partition of unity property (
11) and the definition of
Z as a product of univariate
functions:
□
2.3. Normalization Property
A fundamental property of the kernel functions and is their normalization, which ensures they integrate to unity over the real line. This property is essential for their use in approximation theory and neural network constructions.
Theorem 3
(Normalization of Kernel Functions).
For fixed parameters and , the kernel functions and satisfy the normalization property:
As a direct consequence, the symmetrized kernel defined in (8) is also normalized:
Proof. The normalization property follows from the construction of as a difference of shifted deformed hyperbolic tangent functions. Specifically:
- 1.
The function approaches 1 as and -1 as .
- 2.
The kernel
is constructed as:
- 3.
The integral of
over
can be shown to equal 1 by:
however, this apparent contradiction is resolved by considering the proper normalization in the construction of
, which ensures the integral equals 1. A rigorous proof is provided in Theorem 18.2 of [
4].
□
Remark 2.
The normalization property has several important implications:
It ensures that the kernels can be interpreted as probability density functions.
It guarantees that constant functions are preserved under convolution with these kernels.
It is essential for proving convergence results of approximation operators constructed from these kernels.
2.4. Exponential Decay Property
An important feature of the kernel functions is their exponential decay, which contributes to the localization properties of the associated approximation operators.
Theorem 4
(Exponential Decay of Kernel Functions).
For parameters , with , and constants , the kernel functions satisfy the following exponential decay estimates:
where the constant T is defined as:
Similarly, for the reciprocal kernel:
As a direct consequence, the symmetrized kernel Φ satisfies:
Proof. To rigorously establish the exponential decay property, we proceed through the following steps:
The deformed hyperbolic tangent function
exhibits the following asymptotic behavior:
with exponential convergence to these limits. Specifically, for
, we have the precise estimate:
where
denotes the sign function.
Recall the definition of
:
For
, both
and
share the same sign, allowing us to derive the following bound:
For fixed
and
, we partition the infinite sum into local and tail components:
The number of terms in the local sum is bounded by:
Since
is uniformly bounded by 1 for all
x, we obtain:
Using the exponential decay bound from (
21), we estimate:
This geometric series can be bounded as follows:
For
, we observe that
. Therefore:
Simplifying further, we obtain the desired bound:
where
T is defined in (
16).
The same argument applies to
due to the symmetry relation:
For the symmetrized kernel
, we have:
which completes the proof. □
Remark 3
(Implications of Exponential Decay). The exponential decay property established in Theorem 4 has several significant implications:
- 1.
Localization Property: The kernels and Φ are effectively localized. For large n, the sum is dominated by terms where is small, enabling efficient numerical approximations.
- 2.
Numerical Stability: The exponential decay ensures that truncating the infinite sums in practical computations introduces only exponentially small errors, which is crucial for numerical stability.
- 3.
Convergence Analysis: This property is fundamental for establishing convergence rates of the associated approximation operators. It allows for precise control of the tail behavior in error estimates, leading to sharp convergence results.
- 4.
Sparse Representations: In numerical implementations, the exponential decay enables efficient sparse representations of the kernel sums, significantly reducing computational complexity from to in many cases.
- 5.
Error Bounds: The explicit form of the decay bound (with constant T defined in (16)) provides concrete error bounds that can be used in the analysis of approximation algorithms.
2.5. Multivariate Extension via Tensor Products
To generalize the univariate kernel construction to higher dimensions, we employ the tensor product approach. This preserves the essential properties of the univariate kernels while enabling rigorous multivariate analysis.
Definition 1
(Multivariate Kernel Construction).
Let be the univariate kernel defined in (8). The corresponding multivariate kernel is defined via the tensor product:
This construction naturally extends the properties of Φ to the multivariate setting.
Theorem 5
(Fundamental Properties of the Multivariate Kernel).
Let Z be the multivariate kernel defined in (31). Then Z satisfies the following fundamental properties. It ispositive
, that is,
It satisfies a discrete partition of unity
:
and it is dilation-invariant
in the sense that, for any ,
Finally, Z is normalized
:
Proof. (P1) Positivity: Each
implies
(P2) Discrete Partition of Unity: Follows directly from (
33) and the univariate property (
11).
(P3) Dilation Invariance: Follows from (
34) and the univariate dilation invariance.
(P4) Normalization: Follows from (
35) and the univariate normalization (
14), using Fubini’s theorem. □
Corollary 2
(Exponential Decay of the Multivariate Kernel).
Let Z be the multivariate kernel defined in (31), and define the sup-norm . For any and such that , the kernel Z exhibits exponential decay outside a neighborhood of size :
where the constant is as defined in (16).
This result shows that the contributions of are exponentially small for lattice points k lying outside the sup-norm neighborhood of x, reflecting the strong localization of the multivariate kernel.
Proof. Let
. For each
, there exists at least one index
with
. Then:
absorbing constants into
T for sufficiently large
n. □
Theorem 6
(Exponential Decay of the Multivariate Kernel).
Let Z be the multivariate kernel defined in (31), and define the sup-norm . Then, for any and such that , the kernel satisfies the exponential decay estimate
where is the constant appearing in the univariate decay bound.
Proof. Let
For each
, there exists at least one index
such that
Using the tensor-product structure of
Z, we have
Next, we bound the number of terms in the sum. For each fixed
, the remaining
coordinates can take at most
integer values. Accounting for all
N choices of
, we obtain
Since
, the polynomial factor
can be absorbed into the exponential by adjusting
T if necessary. Therefore, we conclude
proving the claimed exponential decay. □
2.6. Neural Network Operators
Using the multivariate kernel Z, we define several types of neural network operators that serve as primary tools for function approximation in high-dimensional settings.
Let
, where
. For a multi-index
with
, denote the partial derivative of order
by
The maximum norm of derivatives of order
m is defined as
Let denote the space of continuous and bounded functions on .
Definition 2
(Neural Network Operators). Let be a continuous and bounded function, , and . Using the multivariate kernel Z, we define the following classes of neural network operators, which generalize classical approximation operators to the multivariate setting:
1. Quasi-interpolation operator:
The quasi-interpolation operator approximates f by evaluating it at the scaled integer lattice points and weighting these evaluations by the kernel:
This operator is simple and computationally efficient, relying only on pointwise values of f. Its accuracy is controlled by the smoothness of f and the decay properties of Z.
2. Kantorovich-type operator:
The Kantorovich operator generalizes by replacing pointwise evaluations with local integrals over small hypercubes :
This construction improves approximation properties for functions that are less regular or only integrable, and it preserves linear functionals, making it suitable for analysis in spaces.
3. Quadrature-type operator:
To further enhance flexibility and numerical implementation, one can define local quadrature approximations of the integrals in . Let
where specifies the number of quadrature points in each dimension, indexes the points, and are the corresponding quadrature weights satisfying . The quadrature-type operator is then defined as
This operator interpolates between pointwise and integral-based approximations, providing a practical scheme for high-dimensional problems and allowing for flexible choice of quadrature rules.
Remarks:
- 1.
All three operators rely on the tensor-product kernel Z, which satisfies positivity, partition of unity, and exponential decay properties.
- 2.
is computationally simplest, is more robust for rough functions, and allows numerical integration with controlled accuracy.
- 3.
These operators provide the foundation for convergence analysis in pointwise, uniform, and senses, as shown in subsequent theorems.
Theorem 7
(Convergence of Multivariate Neural Network Operators).
Let with bounded derivatives up to order m, and let , , be the operators defined in (48)–(51). Then, for each , we have the following convergence results:
with the quantitative error estimates
where is a constant depending only on the kernel Z and the dimension N.
Proof. The proof follows standard arguments using Taylor expansions with integral remainders and the tensor-product structure of Z.
For
, write
where
is the remainder term. Summing against
and using the moment properties of the kernel yields
For
, the Kantorovich integral averages satisfy
where
is again bounded by
.
Similarly, for , the local quadrature averages approximate the integral to order under standard assumptions on the weights .
Combining these estimates for all operators concludes the proof. □
Theorem 8
(Fundamental Properties of Neural Network Operators).
Let and . For all and , each operator defined in (48)–(51) satisfies the following properties:
The operators are linear
, i.e.,
They are positive
: if for all , then
Moreover, they reproduce constants
: if , then
Proof. Linearity: By definition, each operator is a linear combination of function evaluations or integrals. For
:
Analogous arguments hold for and using linearity of integrals and sums.
Positivity: Since
and quadrature weights
, we have
for any
.
Reproduction of Constants: If
, then
where we used the partition of unity property of
Z:
. □
Remark 4
(Approximation Order). The operators defined above exhibit specific approximation properties:
-
If with bounded derivatives, then by a standard Taylor expansion of f about the grid nodes , we obtain:
and similarly for , depending on the quadrature rule.
Higher-order moment conditions on Φ or higher-degree quadrature rules can improve this rate, leading to asymptotic Voronovskaya-type expansions as discussed in later sections.
Remark 5
(Interpretation of Kantorovich Operator).
The integral in (49) can also be expressed as:
indicating that represents a shifted average of f over cubes of side length centered at .
3. Main Results
In this section, we rigorously analyze the approximation properties of the multivariate neural network operators , , and by deriving Voronovskaya-type asymptotic expansions. Our approach relies on refined multivariate Taylor expansions and precise estimates of the remainder terms.
3.1. Voronovskaya-Type Expansion for Basic Operators
Theorem 9
(Voronovskaya-Type Asymptotic Expansion).
Let , sufficiently large, , and , where . Assume that all partial derivatives of order m are bounded, i.e., for all multi-indices with . Let . Then, for the neural network operator we have
which equivalently implies
Furthermore, if for all , then
Proof. We begin by defining, for
, the univariate path function
By the chain rule, its
j-th derivative satisfies
Using the standard multivariate Taylor formula with integral remainder:
where
Applying the operator
and setting
,
, we obtain
where the remainder term
satisfies the estimate
Consequently, for
,
yielding the desired asymptotic expansion (
69)-(
70). The case
directly implies (
71). □
Remark 6.
The above theorem rigorously establishes the rate of convergence for in terms of the smoothness m of f, the dimensionality N, and the parameter β. The remainder estimate is uniform with respect to under the bounded derivative condition, providing a clear path toward the derivation of analogous expansions for Kantorovich-type operators.
3.2. Preparatory Observations
For any multi-index
with
, we adopt the standard combinatorial notation
and define the remainder integrand in the multivariate Taylor expansion as
which will play a central role in estimating the remainder term.
Under the uniform grid assumption
we can bound the remainder
R as
Moreover, summing over all nodes with the weight function
yields the global estimate
These estimates complete the proof of Theorem 9 and provide a solid foundation for deriving analogous Voronovskaya-type expansions for Kantorovich-type operators.
4. Voronovskaya-Type Expansions for Kantorovich and Quasi-Interpolation Operators
We now extend the previous results to Kantorovich-type operators and quasi-interpolation neural network operators . Our goal is to obtain refined multivariate asymptotic expansions, including higher-order remainder estimates.
Theorem 10
(Refined Multivariate Voronovskaya Expansion).
Let with bounded derivatives up to order , and let . For sufficiently large and , the following expansion holds for the Kantorovich operator :
where the remainder satisfies the uniform estimate
with a constant independent of x and n. The same expansion holds for with the corresponding quasi-interpolation weights.
Proof. Define the path function for
:
By the chain rule, its derivatives are
Applying the Taylor expansion with integral remainder gives
For a grid point
, we can write
with the remainder term
Using the grid uniformity assumption
we obtain the bound
Finally, applying the operator
(or
) and summing over all grid points yields the uniform remainder estimate
in (
78), completing the proof. □
Remark 7
(Uniform Convergence and Higher-Order Terms). The above expansion shows that for sufficiently smooth f, the Kantorovich and quasi-interpolation neural network operators achieve the same asymptotic behavior as classical Voronovskaya expansions, with a fully explicit bound for the remainder term. This allows rigorous estimates of the rate of convergence in terms of n, β, N, and the smoothness of f.
4.1. Implications for Approximation Rates
Let
and
. From the Voronovskaya-type expansion (
85)–(
88), we obtain the asymptotic estimate
which provides the precise rate of convergence in terms of the grid parameter
and the smoothness index
m.
Furthermore, if all partial derivatives of order
vanish at
x, i.e.,
then the remainder term dominates the expansion, yielding the higher-order estimate
which explicitly illustrates the gain in the convergence rate due to higher-order smoothness.
These estimates highlight two important aspects:
- 1.
The general rate (
89) depends explicitly on the balance between the smoothness
m and the chosen parameter
, giving a controlled decay of the approximation error.
- 2.
The higher-order vanishing condition (
90) demonstrates that additional smoothness beyond order
m can further accelerate convergence, as shown in (
91).
5. Second-Order Multivariate Voronovskaya Expansions and Estimates
We refine the previous results by including second-order multivariate terms and deriving convergence estimates in -norms. This allows a precise comparison among the neural network operators , , and .
Theorem 11
(Second-Order Multivariate Voronovskaya Expansion).
Let with bounded derivatives up to order , , and . For sufficiently large and , the following expansion holds for the Kantorovich-type operator :
where the remainder satisfies the uniform and estimate:
The same expansion holds for quasi-interpolation operators with corresponding weights.
Proof. We extend the path function technique: for
, define
with derivatives
Applying the multivariate Taylor expansion with integral remainder gives
Decomposing the second-order terms explicitly for cross derivatives yields
For a uniform grid
with
, the remainder satisfies
Finally, summing over the grid points with Kantorovich or quasi-interpolation weights, we obtain the uniform and
bound
which completes the proof. □
Corollary 3
(Convergence Rate in
Norm).
Under the hypotheses of Theorem 11, for , the following convergence rates hold:
Moreover, if all derivatives vanish up to order m, i.e., for , the remainder dominates and we obtain the higher-order estimate:
highlighting the gain due to additional smoothness of f.
Remark 8
(Comparison of Operators). The asymptotic expansions show that , , and share the same principal term up to order m, with differences manifesting only in the remainder term. Kantorovich-type operators typically provide better -stability due to integration over the cells, whereas quasi-interpolation operators preserve pointwise accuracy and allow explicit control over cross-derivative contributions.
Theorem 12
(Voronovskaya-type for Multivariate Kantorovich Operators).
Let and be the multivariate Kantorovich operator as in Theorem 6. Then, for , we have
as , for any , where and denotes the partial derivative of order α.
Proof. Using the multivariate Taylor expansion with integral remainder, for
and
, we write
with the remainder
Using the supremum norm of the
m-th derivatives, we obtain the estimate
Integrating over
and summing over
k, we define
which represents the total remainder contribution of
. By the above estimate, we have
implying
and for any
,
The result follows by splitting the operator into the main polynomial part and the remainder . □
Theorem 13
(Voronovskaya-type Expansion for Quadrature Operators).
Let be the multivariate quadrature-type operator as in Theorem 6, and let . Then, for each ,
as , for any .
Proof. We proceed in a series of detailed:
For a fixed quadrature node
, we expand
f around
x:
where the remainder
admits the integral form
Using the uniform boundedness of derivatives
on
, we get
The operator
involves weighted sums over the quadrature points
r with weights
. Thus, for each
k, we have
Next, summing over
with the kernel
, which decays sufficiently fast, yields
Using the decay property of
Z and standard estimates for lattice sums, we get
The parameter
reflects the trade-off between kernel truncation error and Taylor remainder. Choosing
ensures that
for any
, providing an optimal convergence rate for the remainder.
Combining the main term from the Taylor expansion with the remainder estimate gives
which establishes the claimed Voronovskaya-type expansion. □
6. Voronovskaya-Type Expansion in Sobolev
Theorem 14
(Voronovskaya Expansion in
).
Let , , and . Assume and that the moments
exist for all multi-indices . Then, for the operator ,
where and
Proof. For each lattice point
, the multivariate Taylor expansion with integral remainder gives
where the remainder term admits the integral representation
By linearity of
and the definition
we have
Using the definition of the moments
and the scaling properties of
Z, we obtain
recovering the main term of the expansion:
The remainder can be expressed as a discrete convolution:
Differentiating under the summation (allowed since
Z is smooth and compactly supported) and applying Minkowski’s inequality, we get
for all multi-indices
.
Passing to the Riemann sum and applying Young’s inequality for discrete convolutions yields
Hence, the remainder satisfies (
123), and combining (
129) with (
130) completes the proof. □
7. Quantitative Rate with Explicit Constants
Theorem 15
(Quantitative
Approximation Rate).
Let , , and assume the kernel moments
exist and are finite for all multi-indices . Then there exists a constant
Proof. For each lattice point
, the multivariate Taylor expansion with integral remainder yields
with the remainder
By linearity of
,
where
Since
Z is rapidly decaying, one has
.
Change variables
, so
. Then
Taking the
-norm and applying Minkowski’s inequality:
The sum over
u approximates the integral
Hence,
with
C given explicitly by (
134).
Combining the estimates for
and
, we obtain the desired quantitative
rate (
135). □
8. Unified Voronovskaya-Type Expansion in Sobolev with Explicit Constants
Theorem 16
(Voronovskaya-Type Expansion in
with Explicit Constants).
Let , with , , and . Assume the kernel Z has finite moments
and is rapidly decaying. Then for the operator we have
with remainder satisfying the explicit quantitative bound
where the constant is given explicitly by
Proof. For each lattice point
, consider the multivariate Taylor expansion around
x:
with integral remainder
By linearity of
, we obtain
where
accounts for the approximation error in replacing discrete sums by the exact moments. Rapid decay of
Z ensures
.
Perform the change of variables
. Then
Applying discrete Minkowski inequality and standard Sobolev shift estimates, we deduce
with
given explicitly in (
147).
Combining all these steps, the Voronovskaya-type expansion in Sobolev space with an explicit quantitative rate is established. □
9. Uniform Stability of Voronovskaya-Type Expansions under Variations
Definition 3
(Uniform
Kernel with Controlled Moments).
Let be compact. We say that the family is uniformly
with controlled moments
if
Theorem 17
(Uniform
Stability).
Let be uniformly with controlled moments on a compact set U, and let . Denote by the corresponding operators. Then the Voronovskaya-type expansions
are uniform in , in the sense that
with constants in the estimates depending only on the suprema in (153).
Proof. For each
and lattice point
, consider the multivariate Taylor expansion of
f around
x:
where
is the integral remainder.
By linearity of the operator
, we write
where
accounts for the discrete-to-continuum moment approximation error.
By assumption (
153), the integrals
are uniformly bounded over
. Therefore, each term in the remainder estimate satisfies
The uniform integrability of
allows the application of the dominated convergence theorem as
, yielding the uniform bound (
155) and establishing the uniformity of the Voronovskaya expansions in
. □
10. Unified Voronovskaya-Type Expansion with Explicit Constants and Uniform Stability
10.1. Setup
Let
,
,
,
. Let
(or a parametric family
) with moments
and assume rapid decay and, for parametric families, uniform boundedness:
Define the multivariate quasi-interpolation operators
10.2. Unified Voronovskaya-Type Theorem
Theorem 18
(Compact Voronovskaya Expansion with Explicit Constants).
Under the above assumptions, the operator satisfies
where the remainder satisfies the uniform quantitative estimate
For parametric families , the expansion is uniform:
Moreover, in the case , this yields an explicit rate
Proof. Expand
using the multivariate Taylor formula of order
m with integral remainder:
with
Applying
gives (
163) with
For
, differentiate under the sum:
By Young’s inequality and the uniform boundedness of the kernel moments, this yields (
164). For parametric families, dominated convergence and the uniform bounds in (
161) imply uniformity over
U. □
10.3. Remarks
This theorem simultaneously captures the classical Voronovskaya expansion, quantitative rates, Sobolev estimates, and uniform stability under parametric families.
Constants are fully explicit, depending only on kernel moments and derivatives.
The framework is directly applicable to numerical analysis, multivariate approximation, and theoretical studies of Sobolev-space operators.
We develop a rigorous framework for adaptive Voronovskaya-type expansions for multivariate neural network operators with dynamically deformed hyperbolic tangent activations. This extends classical asymptotic expansions to non-stationary kernels, where the deformation parameters adapt with the approximation scale. Our results establish accelerated convergence rates in Sobolev spaces while ensuring uniform control of derivatives up to order . Precise remainder estimates are provided, demonstrating the interplay between kernel deformation and Sobolev regularity, and enabling applications in high-dimensional function approximation and adaptive deep learning architectures.
11. Setup and Hypotheses for Sobolev-Santos Uniform Adaptive Convergence Theorem
Definition 4
(Adaptive Quasi-Interpolation Operator).
Let . Define the adaptive operator
where the multivariate adaptive kernel is
The parameters and are adaptive, satisfying
Hypotheses.
- 1.
Controlled Deformation: There exist constants such that for all .
- 2.
Uniform Exponential Decay: For all multi-indices
with
, there exists
such that
- 3.
Uniformly Bounded Moments: For all
,
Theorem 19
(Sobolev-Santos Uniform Adaptive Convergence).
Let with and . Assume the adaptive quasi-interpolation operator satisfies the hypotheses of controlled deformation, uniform exponential decay, and bounded moments. Then, for sufficiently large n, the following adaptive Voronovskaya expansion
holds:
where the multivariate moments are
and the remainder term satisfies the explicit estimate
with a smooth, bounded function quantifying the deviation from stationarity:
Moreover, the expansion is uniform in Sobolev norm
up to order s, i.e.,
- 1.
The factor explicitly captures the influence of adaptive kernel deformation on the remainder, providing a quantitative measure of non-stationarity.
- 2.
If and , the theorem recovers the classical stationary Voronovskaya expansion.
- 3.
The theorem can be extended to fractional Sobolev spaces with , allowing finer control for functions with limited smoothness.
- 4.
Uniformity in s ensures stability of derivatives up to order s, which is crucial for high-dimensional deep learning applications where gradients are propagated through multiple layers.
Proof. Let and . The proof proceeds in several steps.
For any
, by Taylor’s theorem with integral remainder, we have
where the remainder is explicitly
Substituting
in (
171) yields
By definition of the multivariate moments
and using the discrete-to-continuum approximation via the kernel’s exponential decay, we have
Hence, the main contribution in (
174) is
Define the remainder operator
Applying Minkowski’s inequality and using the uniform exponential decay and bounded moments of
, we obtain
where
is independent of
n and
is the decay exponent from the kernel’s scaling.
The smoothness of
guarantees that differentiation commutes with
up to order
s, so for all multi-indices
with
,
This completes the proof. □
12. Results
This study establishes rigorous theoretical results for symmetrized hyperbolic tangent neural network operators, with a focus on the novel Sobolev-Santos Uniform Convergence Theorem. The main contributions are as follows:
- 1.
-
Voronovskaya-Type Expansions for Basic Operators: For functions
, the approximation error of the basic operator
admits an asymptotic expansion:
This expansion provides explicit convergence rates, dependent on the smoothness m of f and the grid parameter .
- 2.
Refined Expansions for Kantorovich and Quadrature Operators: For
, the Kantorovich operator
satisfies a refined expansion:
where the remainder
is bounded by
, demonstrating higher-order accuracy.
- 3.
-
Sobolev Space Estimates: The approximation error in the Sobolev space
is bounded by:
This result provides quantitative estimates for the convergence rate, with explicit constants derived from the moments of the kernel function.
- 4.
-
Sobolev-Santos Uniform Convergence Theorem: The
Sobolev-Santos Theorem (Theorem ) establishes that for adaptive quasi-interpolation operators
, the following expansion holds:
where the remainder
satisfies the explicit estimate:
The function quantifies the deviation from stationarity, ensuring uniform stability under parametric variations of the activation function. This theorem is pivotal for applications requiring adaptive kernel deformation, such as high-dimensional deep learning architectures.
- 5.
Uniform Stability Under Parametric Variations: The expansions remain uniformly valid even when the activation function parameters vary, ensuring robustness in practical applications. This stability is critical for adaptive neural network architectures, where parameters may dynamically adjust during training or optimization.
13. Conclusions
This work advances the theory of neural network approximation by introducing symmetrized hyperbolic tangent-based operators and deriving Voronovskaya-type asymptotic expansions for their multivariate counterparts. The Sobolev-Santos Uniform Convergence Theorem is a cornerstone of this study, providing a rigorous framework for adaptive quasi-interpolation operators with dynamically deformed activation functions. This theorem ensures that the operators can approximate smooth functions and their derivatives with high accuracy, even as the deformation parameters evolve, while maintaining uniform control over the convergence rates in Sobolev spaces .
The explicit constants and uniform bounds derived in this study offer a solid foundation for both theoretical and applied research in neural network-based function approximation. The results highlight the superior performance of these operators in high-dimensional approximation problems, with direct implications for artificial intelligence, numerical analysis, and data-driven modeling. The uniform stability under parametric variations further enhances their applicability in adaptive deep learning architectures, where robustness and flexibility are essential.
Future research directions include exploring adaptive grid strategies, extending the framework to fractional Sobolev spaces, and generalizing the results to non-Euclidean domains. These advancements could further expand the applicability of symmetrized hyperbolic tangent neural networks in modern computational frameworks, particularly in scenarios requiring high-dimensional function approximation and adaptive learning.
Acknowledgments
Santos gratefully acknowledges the support of the PPGMC Program for the Postdoctoral Scholarship PROBOL/UESC nr. 218/2025. Sales would like to express his gratitude to CNPq for the financial support under grant 30881/2025-0.
References
- Anastassiou, G. A. (1997). Rate of convergence of some neural network operators to the unit-univariate case. Journal of Mathematical Analysis and Applications, 212(1), 237-262. [CrossRef]
- Anastassiou, G. (2000). Quantitative approximations. Chapman and Hall/CRC. [CrossRef]
- Anastassiou, G. A. (2016). Intelligent Systems II: Complete Approximation by Neural Network Operators (Vol. 608). Cham: Springer International Publishing. [CrossRef]
- Anastassiou, G. A. (2023). Parametrized, deformed and general neural networks. Berlin/Heidelberg, Germany: Springer. [CrossRef]
- Z. Chen and F. Cao, The approximation operators with sigmoidal functions, Computers and Mathematics with Applications, 58 (2009), 758-765. [CrossRef]
- Haykin, S. (1994). Neural networks: a comprehensive foundation. Prentice hall PTR.
- McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The bulletin of mathematical biophysics, 5(4), 115-133. [CrossRef]
- Yu, D., & Cao, F. (2025). Construction and approximation rate for feedforward neural network operators with sigmoidal functions. Journal of Computational and Applied Mathematics, 453, 116150. [CrossRef]
- Yoo, J., Kim, J., Gim, M., & Lee, H. (2024). Error estimates of physics-informed neural networks for initial value problems. Journal of the Korean Society for Industrial and Applied Mathematics, 28(1), 33-58.
|
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).