Abstract Syntax Tree (AST)#
One of the key components of yaflux
is the ability to perform static analysis on the declared workflow at initialization time.
Wrapped methods of the @yf.step
decorator can have their source code directly inspected and validated for correctness.
Because decorators are evaluated at class definition time, we can perform these checks before any analysis steps are run, and provide immediate feedback if the workflow is incorrect.
This feature is similar to compile-time errors in statically typed languages, but in this case, we are checking the structure of the workflow rather than the types of the variables.
This stops you from running a workflow that is guaranteed to fail, and helps catch errors early in the development process.
Validations#
Dependency Usage#
One of the primary validations performed on the AST is to ensure that all results used by a step are declared as requirements.
For example, consider the following step:
import yaflux as yf
class Analysis(yf.Base):
@yf.step(creates="processed_data", requires="raw_data")
def process_data(self) -> int:
return self.results.raw_data * 2
We can inspect the source code of the process_data
method and verify that the raw_data
result is declared as a requirement.
Lets instead create a step that uses an undeclared result:
import yaflux as yf
class Analysis(yf.Base):
@yf.step(creates="processed_data")
def process_data(self) -> int:
# does not declare 'raw_data' as a requirement
return self.results.raw_data * 2
A nice benefit of the @yf.step
decorator is that when this class definition is loaded, yaflux
will raise an error indicating that the raw_data
result is used without being declared as a requirement.
import yaflux as yf
try:
class Analysis(yf.Base):
@yf.step(creates="processed_data")
def process_data(self) -> int:
# does not declare 'raw_data' as a requirement
return self.results.raw_data * 2
except yf.AstUndeclaredUsageError as e:
print(e)
This validation is crucial for ensuring that the declared workflow is correct and that all dependencies are explicitly stated.
Direct Assignments#
One of the important characteristics of yaflux
is that results must be tracked by the framework to ensure immutability and stop side effects.
This means that direct assignments to self
is not allowed, as it would bypass the tracking mechanism.
For example, consider the following step:
import yaflux as yf
class Analysis(yf.Base):
@yf.step(creates="processed_data", requires="raw_data")
def process_data(self) -> int:
self.alias_data = self.results.raw_data * 2
return self.alias_data
In this case, the processed_data
result is assigned directly to self
, which bypasses the tracking mechanism.
It also makes it harder to reason about the state of the analysis and can lead to unexpected behavior.
When this class definition is loaded, yaflux
will raise an error indicating that direct assignments are not allowed.
import yaflux as yf
try:
class Analysis(yf.Base):
@yf.step(creates="processed_data", requires="raw_data")
def process_data(self) -> int:
self.alias_data = self.results.raw_data * 2
return self.alias_data
except yf.AstSelfMutationError as e:
print(e)