In this post, I’ll compare the new (post v4) and old (pre v4) versions of the Aloha code bases and try to give some motivation for the changes. I’ll hopefully provide a little insight along the way and describe the direction I would like to see Aloha progress towards in the future.
- Simplified ModelFactory creation and usage.
- Drastically simplified Model APIs.
- Inclusion of Auditors, a score decorator based on a variation of my Recursive Auditing post.
- Modularization of model I/O types requiring additional software dependencies.
A Note About Types
In this post, the code has the following type parameters, with the associated meanings:
U: The upper type bound of a model’s codomain,
NThe natural output type of a model. For instance, a regression model has a real-valued output type. In Aloha, this is a
Double. Other model types have flexible output types, such as decision trees which can have any natural output type.
AThe domain of a model.
BThe codomain of a model, upper bounded by
Simplified Model Factories
Pre 4.0.0 Model Factories
In Aloha 3, model factories could create
models by calling the
getModel method with the appropriate type parameters and
implicit parameters or by creating a
TypedModelFactory capable of only
producing models with one specific input and output type.
4.0.0 Model Factories
Prior to Aloha 4,
more flexible than
TypedModelFactory, but this flexibility is not strictly
necessary. After Aloha was in the wild for an extended amount of time, this
additional flexibility was found to be unnecessary, especially in a world
dominated by micro services. Since minimizing the number of ways to use the library
minimizes the footprint and a smaller footprint is easier to maintain, I decided
the notion of an untyped model factory was unnecessary and it was removed in
Aloha 4. Instead,
was made a sealed, abstract base class for which there is one implementation,
Creation methods in the
ModelFactory companion object return instances of
ModelFactoryImpl which can easily be downcast to a
leads to the first lesson:
Constrain the number of ways to accomplish the same goal. Do not allow two ways to do something when one will suffice.
Pre 4.0.0 Models
The signature says it all! It smacks of poor design. The
score method was
unfortunately a requirement (not a choice) which had a few major implications.
The first is that since
Score is a (now outdated) v2.4.1
Protocol Buffer instance,
the protobuf v2.4.1 library was a dependency in aloha-core and part of the
most widely used API. Over time, this has caused many protobuf version conflicts
(DLL Hell), especially as other
libraries adopted more modern versions of protobuf. This leads to the second
In major APIs, avoid types that require additional dependencies. This is especially true in core libraries.
Another problem with the API above is that as more basic versions of prediction
functions were added (
Eithers), I had to allow
to be produced in the model implementations. The other problem, which is
totally my fault is that
Options can easily be created via
scoreAsEither(a).right.toOption. But since this was a nuisance, I added the
apply method. This clearly violates the first lesson about providing
multiple ways to accomplish a desired goal.
Models are simply functions with identifiers and can be cleaned up. Dead simple. Enough said!
To avoid model reimplementation while allowing different model output
representations, I chose to add the ability to inject a score decorator into
a model. This takes the form of
Auditor[U,-N,+B<:U] in Aloha 4.
By allowing the auditor instances to parameterize
factory can inject all models and submodels with auditors of the appropriate
type. This enables models in Aloha 4 to be
truly generic in the output type. It also, allows for easy refactoring of the
to place it in its own module,
Because of this, I could easily remove of the protobuf dependency in
One additional thing to notice is the type parameters in
Auditor. This easily
could have been done with a
unary type constructor
Auditor[M[_], N]), but I already put so much effort into making
Aloha work with Java that I didn’t want to
introduce this into an important API. This leads to lesson three:
Pick the languages a project should support. Don’t use niceties from one language at the expense of usability from another language. If you must use language-specific niceties and idioms, create separate tailored language-specific APIs.
A Note on the N Type Parameter in ModelFactoryImpl
is parameterized by a
ModelFactoryImpl is also
parameterized by the type
N, the natural output type of the top-level models it produces.
N could have been avoided because
morphable auditors can create new auditors for a different natural output type
but I want Aloha users to know the natural output type of the models produced
by factories. This is because auditors can produce coproducts and the consumption of a
model score could happen much later, possibly on a different machine. So,
if the output type is like
Score in the
Score in aloha-proto,
and the consumer is expecting a classifier result like a string or
integer but the factory instantiates a regression model with a real-valued
natural output type, then the consumer will encounter an error and the team
responsible for the factory may be oblivious. This situation can be
exacerbated when the scoring infrastructure (factories and models) is under the
control of a different team than the one that produces models. This type
constraint can be seen as a consistency check across teams to ensure, within
reason, that an appropriate class of models will be used on the problem domain.
N is incongruous with the natural output type of a model that the factory
attempts to produce, the factory may raise an error at model creation time and
avoid this situation entirely. This leads the to fourth lesson:
Fail early by design.
Over time, Aloha has evolved to include simpler, more thoughtful high-level APIs. Along the way, the API evolution was guided by overcoming mistakes of the past. Hopefully, by disclosing and discussing some of these mistakes, others will be able to avoid them. So, once more:
- Constrain the number of ways to accomplish the same goal. Do not allow two ways to do something when one will suffice.
- In major APIs, avoid types that require additional dependencies. This is especially true in core libraries.
- Pick the languages a project should support. Don’t use niceties from one language at the expense of usability from another language. If you must use language-specific niceties and idioms, create separate tailored language-specific APIs.
- Fail early by design.