an5 - Intelligent Network Design
STATUS - Mar 2023 - updated an5 to have simpler and clearer goal description using "goal", "constraint" and "handler" keywords
STATUS - Jan 2022 - update example to reflect JSON support, as it trendy ;-) , but not necessarily easier to read
STATUS - 18 April 2021 - coded prototype and completed testing and measurement
Telecommunications services is a commodity business. In brief moments something new and shiny pops up that can command a premium price: the phone call, the telegram, the internet, mobile communications. Each of these growth driving cycles is then followed by need to drive out cost as commoditisation sets in and service providers try to find new and better ways automate operations, while improving customer experience. This focus on automation is key to profitability.
Within the network operations space there are three main process areas: Network Design and Build, Fulfilment and Assurance all having need for systems that support or perform some level of network design.
In general each of design & build, fulfilment & assurance focus on a set of tightly constrained areas to apply automation and then chain the various "little blocks" of automation into larger business processes.
This works to the extent that the structure of what is being managed does not change. If the structure does change, like in the case of a change in the underlying network technology or service topology then it is back to the drawing board to re-create the "little blocks" so they can be reassembled.
The "little blocks" are things like:
- Create an ethernet access service from location A to location B (a MEF example)
- Enable an alternate network path, when main path is experiencing unusually heavy load (a typical closed loop assurance response model)
- Connect VNF within NFVi platform up to internet ingress/egress point (an SDN / VNF example)
In each of the these cases the approach is to:
- Find the right template
- Apply the template allowing for specific considered conditions including input parameters, defined variations and resource allocations
- If the template does not allow for condition then fall out into manual work queue
Things that "little block" approach cannot readily do are:
- Design to rebuilt an NFVi underlay network to support sharing of hypervisor attached storage across multiple Virtual Infrastructure Managers (VIM) - this is often a multi-month rebuild, requiring full redesign
- Create a new template to apply to a MEF ethernet access service, after the access technology has changed from FTTP (fibre to the premise) to FTTN (fibre to the node) - this requires the network engineer and the fulfilment developer to create new templates and potentially revised process to handle the new network topology.
Problems like these need an intelligent network design capability. "Intelligent" as this work is typically done by people with specialist knowledge and experience in network design.
My hypothesis is that if you start to apply Artificial Intelligence to the "bigger" network design problem then most of the "little block" problems will be solved at the same time, as these are mostly the final degenerate and terminating cases of the larger "Intelligent Network Design" problem.
This blog posting is about the architecture, implementation and testing of a prototype "intelligent network design" framework.
The purpose is to:
- Provide an example of using AI solution to network build domain and how this is different to typical telco automation
- Outline the key architectural structures of the framework
- Determine if the problem is tractable
- Gauge interest and get feedback in doing further development of the framework
- See how this could be used to support network planning and design (see diagram from Automation Life-Cycle context)
Introduction
I did an analysis on why Fulfilment systems in telecoms were so expensive to keep up to date. The result is that Fulfilment systems can be significant inhibitors to meeting "speed to market" objectives.
I found that these systems:
- dealt with abstractions that were little higher than procedural programming languages
- additional dynamic catalog driven process, allowed behaviour more like what is readily achievable with object oriented programming languages (greater abstraction and polymorphism), but this was not easily achieved,
- they did not make use of inventory data to find alternate solutions
- implementation typically centred around trying to codify network service and configuration designs provided by network engineers, which
- broke when there was any change in the structure of the underlying network (topology changes or network vendor devices changes).
The answer seemed to lie in having Fulfilment systems that were able to accomodate service topology changes, which in turn meant they needed better path finding capabilities. Path tracing is built on top of network inventory.
I documented some of this in my blog "Telco Inventory & Topology - Enabler for Automation". Around 2016 Network Orchestration was being introduced as a way to try to decouple the network facing part of fulfilment from the customer facing part (the old Service Order Management vs Customer Order Management demarcation).
Two common technologies for Orchestration are:
- OASIS TOSCA , which is backed to ETSI and comes from IT world &
- Netconf/YANG from IETF with heavy backing by CISCO / Tail-f, from the IP network world.
Today most Orchestrators still use similar template driven process flows as traditional service order management systems, but now with inventory (or cut down Configuration Management DB - CMDB version of this) and activation more tightly coupled.
While it is relatively simple to implement a better network path tracing mechanism on top of a network graph, this only works when the edges in the graph already exist. If the edges don't exists then the graph based algorithm are not applicable.
A couple of cases where this exact situation occurs, at both a very small level and a larger level are the:
- need to apply a physical jumper link in network to realise a full circuit (such as occurs with cable patch panels, fibre distribution hubs and data centre rack building)
- need to run a new access tails to a given location (new estate building).
These problems either fall into the realm of network design or special purpose code that handles the design / assign logic for the particular case.
It seemed like there was a need to test a more general means to solve these type of problems, using an AI solution search based approach. As the search is not constrained to an existing network this will involve some "groping in the dark" to find a way get get from the known connection point to target point. Any such solution must be able to achieve this whether the "groping" is with physical or logical network constructs.
An AI based based solution is required, as the problem needs to build the constraints (i.e. network) not just resolve them. This will then allow the more typical service establishment fulfilment request to operate, but now with generated templates rather than hand crafted one. The design activity can then become one of verification.
Needs and Principles
If you are considering mostly network path design and follow on activation then the target architecture will need to use graph based inventory model and leverage the built in path search capabilities. The gap is that this presumes that the network design (and hence resulting graph representation) is already available. So ...
Decision #1 - Need to have architecture that is able to build paths from scratch not just navigate across existing paths.
Resolution - Need to implement a network design generating system, which is domain of AI planning and constraint based solvers.
Key Use Case - build a network from a collection of network components, using AI constraints based solver.
Decision #2 - Need to have a way to represent the problem space that is specific to the network design. Existing network representation models are generally not suitable for use with planning solvers (ITU X509), are focused on network configuration (Netconf/YANG) and do not have sufficiently rich network representation (Oasis TOSCA) or are targeted to a different problem such network simulation languages [14] or are just a syntactical representation choices such as JSON.
To resolve this I started to try to specify a network using JSON. It quickly became evident that trying to represent the domain problem via JSON was not intuitive. In fact it tended to obscure the problem. It seemed that the best way to generate JSON would be to write a small Java code prototype and then use it to generate the JSON output from the Java class / instance representation, rather than start with JSON and try to consume this as input into representational model. So a chicken and egg problem.
As the domain needs to represent a network (a form of graph) the representation needed to support, the following types:
- Network - a collection of edge and nodes
- Network Element - a node within graph theory which could be a physical or logical network element
- Link - to represent the network edge and in networking this should be something that is used to link Network Elements together
- Path - a sequence network elements and links
- Interface - to create a network there needs to be a flexible way to specify interfaces that define how the elements/links that make up a path can be connected together
Resolution - Define a network modelling language that can describe the network parts as well as the build goals and template.
The defined language was - an5 (a nETOWK LANGUAGE WITH 5 CLASSES). This would then allow specification of components and interface compatibilities. The interface compatibilities are used to determine how the network can be built by plugging together the building blocks. The an5 language structure allows specification of an interface including cardinality and signatures:
- common - the common signature definition (plug = rj45)
- provides - what the interface provides (sex = female)
- needs - what the interface needs (sex = male)
- service - what service will be enabled as a result of successfully binding against an interface
The an5 language was derived from Java including similar semantics for interfaces and class inheritance, but language is mostly declarative, without need to implement methods (though this might change with need to add resource allocation behaviours).
Key Use Case - be able to write network specification using an5, compile this and then use generated package to define network build "programs"
The following is an example of an an5 specification for:
- Computer with PCIe slots
- NIC with Ethernet Ports
- Ethernet Switch
- cat6 cables
/*
What: A small test example of set of classes
that can be used to assemble a data centre network.
It is example of set of a nETWORK LANGUAGE WITH 5 CLASSES
definitions, not instances.
By: John Hartley - Graphica Software/Dokmai Pty Ltd
*/
package an5.generic.dctypes;
@Model(
spec = "an5",
version = "0.2"
)
interface pcie_interface {
common = { "type=pcie", "width=(1|4|8|16)", "gen=([1-4]\\.*)"};
}
interface pcie_slot extends pcie_interface {
needs = {"form=(card)?"};
provides = {"form=slot"};
binding = "slot-%I+1";
string name;
}
class computer extends element exposes pcie_slot {
reflects pcie_slot[] slot;
string name;
}
interface rj45_plug {
common = { "plug=rj45"};
}
interface baset_sink extends rj45_plug {
needs = { "cable=(cat([3-8].?))?",
"gender=male", "media=copper"};
provides = {"plug=rj45", "gender=female"};
}
interface ethernet_port_baset extends baset_sink {
common = { "service=ethernet"};
binding = "p-%I+1";
string name,
MAC;
}
interface ethernet_lag {
common = { "service=(ethernet)+" }
binding = "lag-%I";
string name;
}
interface ethernet_vlan {
common = { "service=(ethernet_vlan){0,4096}" }
needs = { "service=ethernet" };
binding = "vlan-%I";
string name;
}
class ethernet_lag_link extends link exposes ethernet_lag {
reflects ethernet_port_base_t[] ports;
}
class ethernet_vlan_link extends link exposes ethernet_lag, ethernet_vlan {
reflects ethernet_lag lags[];
reflects ethernet_port_base_t ports[];
}
interface pcie_card extends pcie_interface {
needs = {"form=(slot)?"};
provides = {"form=card"};
}
class pcie_nic extends element exposes pcie_card, ethernet_port_baset {
reflects pcie_card card;
reflects ethernet_port_baset port[];
}
class switch extends element exposes ethernet_port_baset, ethernet_vlan, ethernet_lag {
reflects ethernet_port_baset port[];
reflects ethernet_vlan vlan[];
reflects ethernet_lag lag[];
string name;
}
interface catx_link extends rj45_plug {
provides = {"cable=(cat([3-8].?))?", "gender=male",
"media=copper"};
needs = {"plug=rj45", "gender=female"};
}
interface cat6_link extends catx_link {
provides = {"cable=cat6"};
}
class cat6_cable extends link exposes catx_link {
string type = "physical",
colour,
label;
int length;
reflects catx_link a, b;
}
/* new 0.2 spec */
goal class ethernet_lan extends network {
@mandatory switch[] fabric;
@mandatory computer[] hosts;
object[] uses;
service = { "ethernet", "(ethernet_vlan)*" };
handler = "an5.solve.an5CreateNetwork";
constraint class ethernet_node extends element {
@mandatory computer host;
@mandatory switch ether;
object[] uses;
}
}
/* old 0.1 spec
abstract class ethernet_lan extends network {
@mandatory switch[] fabric;
@mandatory computer[] hosts;
object[] uses;
service = { "ethernet", "(ethernet_vlan)*" };
}
abstract class ethernet_node extends element {
@mandatory computer host;
@mandatory switch ether;
object[] uses;
} */
The compiler takes (Java like) an5 specification as input and will generate compiled code for efficiency. As the an5 syntax specification is very Java like this means it it easy and possible to generate Java code directly and avoid having to create an intermediary meta-model for an5 representation (see Decision #6). This has benefits in ease of debugging models but also in overall performance of solve engine.
Decision #3 - Need to seperate out the generic CSP solver and its search algorithms from the network domain level problem reduction logic, combinatorial generators, scoring and domain knowledge based optimisation.
Resolution - Create generic AND/OR solver (see Nilsson [1] & Winston [2]) for main control structure and have this invoke domain level template. The class level structure is illustrated here:
Example Use Case - Create Network by connecting 3 computers to 1 switch/router:
- R(n) = J(R(switch/router), 3 x Computer)) - "create resulting network R(n) by joining J(network, 3 x elements) the set of elements to an initial network R(switch/router)" or
- R(n) = C(J(R(switch/router), 1 x Computer), J(R(switch/router), 1 x Computer), J(R(switch/router), 1 x Computer))) - "create a network by connecting C(J(R(n)), J(R(n)) ..) multiple networks created by joining J(R(switch/router), 1 x Computer) a set of individual networks with have a common central hub (switch/router).
This shows that there are alternate ways to reduce the initial problem into a set of smaller problems. In the top case the result is an OR tree only, while in the lower case the result is an AND/OR tree structure. In general the solution will require AND/OR based resolutions. Research has also found that using AND/OR can result in significant reduction in the search space size in comparison to using OR tree solvers only (see Dechter [3][4]).
Decision #4 - Need to be able to provide delayed resource allocation as a result of binding to interfaces. This is essential in context of using AI search as allocation of resource at the time of initial binding could result in consumption of resources prior to determination that the search path is a successful one. So need to delay allocation until the search completes with set of valid results and only then commit resources from the finite pool. Typical example is things like IP address or VLAN Id allocation.
Resolution - Define state model for interface binding that allows for binding to be "un-committed" unless explicitly committed.
The binding state model is:
- "Open" => "Base Match" == un-committed Binding against base definition with no open unmatched items
- "Open" => "Reflecting" == un-committed partial Binding against base definition or uncommitted Binding with open unmatched items, with resulting open items and/or service outcome being re-exposed as new interface
- "Base Match" or "Reflecting" => "Committed" == commit previously un-committed Binding, doing any required resource allocation and remove any Reflecting interface from available pool
The Interface and Binding model is critical to allowing specification of virtual services such as Ethernet VLAN and aggregate links such a Ethernet LAG.
NOTE #1: That the semantics of network service such as Ethernet VLAN or Ethernet LAG is coded as an5 language specification and not through the solver runtime (see Decision #5)
Decision #5 - Network protocol interoperability and interface compatibility should only be specified through an5 language specification and not built into the an5 runtime engine
Resolution - provide Interface and associated binding mechanism that:
- allow rich specification of "plug-ability" across the components and
- allow instantiation of new services and protocols as a result of binding across these interfaces and
- support being able to dynamically add new "binding with associated service & protocols" back into the set of available interfaces for further level of "plug-ability".
The other mechanism included in Interface binding is use of "regular expressions" to allow variations across interfaces for compatibility variations (i.e. PCIe Gen 1, 2, 3 &4).
Key Use Cases - Use the interface and binding to support recursive layering of an5 interface and binding to achieve protocol layering such : as physical to data link layer and ethernet to IP layering.
Here is a sample of a set of matching interface / binding specifications and classes that use them:
// A set of an5 interface spec
// for baseT ethernet with VLAN and LAG support
//
interface rj45_plug {
common = { "plug=rj45"};
}
interface base_t_sink extends rj45_plug {
needs = { "cable=(cat([3-8].?))?", <<== Regular expressions
"gender=male", "media=copper"};
provides = {"plug=rj45", "gender=female"};
}
interface ethernet_port_base_t extends base_t_sink {
common = { "service=ethernet"}; <<== Service outcome
binding = "p-%I+1"; <<== Naming substitution rules
string name,
MAC;
}
interface ethernet_lag {
common = { "service=(ethernet)+" } <<== Binding cardinality
binding = "lag-%I";
string name;
}
interface ethernet_vlan {
common = { "service=(ethernet_vlan){0,4096}" } <<== Binding cardinality
needs = { "service=ethernet" };
binding = "vlan-%I";
string name;
}
class ethernet_lag_link extends link exposes ethernet_lag {
reflects ethernet_port_base_t[] ports;
}
class ethernet_vlan_link extends link exposes ethernet_lag, ethernet_vlan {
reflects ethernet_lag lags[];
reflects ethernet_port_base_t ports[];
}
NOTE #2: in defining the interface/binding specification I reviewed Canonical JuJu which provides a means of defining component dependencies through interfaces. However JuJu did have have sufficient control mechanisms and could not accommodate the recursive exposure mechanism required to allow reflection of bound interface to participate in further protocol layers and aggregation cases.
Decision #6 - In deciding to use a specification language (Decision #3) for problem space specifications, there is need to determine how to represent this within the application layer implementation. The typical approach is to define a meta-model representation within the application layer and then create an interpreter to operate on this. As an5 provides an object oriented specification language leveraging Java, the alternative was to leverage the Java meta-model (as used by underlying Java Virtual Machine).
Resolution - Leverage Java meta-model and JVM, as this already supports using compiled code for performance and has an introspection interface to allow runtime interpretation of generated models (from compiler). Another candidate considered was to use JavaScript, however this result in use of fully interpreted model, which in context of very large combinatorial and search space calculations would have significant performance impact.
Decision #7 - More clearly distinguish the Goal specification from the network componentry parts. This was based on feedback that is was hard to see how an5 helped with specification of "target". This was related to an5 leveraging Java and not wanting to introduce to many language elements.
Resolution - Add "goal", "constraint" and "handler" keyboards (and retire Java "abstract") so that is very easy and simpler to specify goal (an5 v0.2 - March 2023). See example here:
/* new 0.2 spec */
goal class ethernet_lan extends network {
@mandatory switch[] fabric;
@mandatory computer[] hosts;
object[] uses;
service = { "ethernet", "(ethernet_vlan)*" };
handler = "an5.solve.an5CreateNetwork";
constraint class ethernet_node extends element {
@mandatory computer host;
@mandatory switch ether;
object[] uses;
}
}
/* old 0.1 spec
abstract class ethernet_lan extends network {
@mandatory switch[] fabric;
@mandatory computer[] hosts;
object[] uses;
service = { "ethernet", "(ethernet_vlan)*" };
}
abstract class ethernet_node extends element {
@mandatory computer host;
@mandatory switch ether;
object[] uses;
} */
Architecture
The architecture for an5, was developed iteratively and not top down:
- The first step was some paper based network definitions and instance models, to try to get idea of how to represent the problem.
- Next there was an attempt to represent these with JSON, then
- Use a specific language to represent the problem space.
As a result of this it became apparent that there was need to do some prototype development and testing to validate the overall ambition for "Intelligent Network Design", as it was to easy to miss the hidden details that needed to be dealt with and get missed in just doing architecture.
So the set of needs and decision outlined above, arose out of initial analysis and practical verification with principle of using "Just Enough Architecture" ;-) .
The overall goal for an5 are to provide:
- a general purpose network representation language and model to support planning, design, simulation and ultimately automated provisioning of IT & network platforms and services (see Cloud Automation/Life Cycle Management diagram in introduction)
- intelligent solvers that can "build networks", "connect to networks", "join networks together", "connect elements within network" and "find paths across network"
- provide network builder that allows fast "snap together" network assembly capabilities that can act as seed into intelligent network design solvers
The basic architecture packaging consists of:
- an5 - a nETWORK LANGUAGE WITH 5 CLASSES - the core (and still evolving) network specification and template language
- compiler - the an5 compiler generates Java code for the models. The generated models are packaged into Java class archives
- model - the core an5 Java class library, which contains the base classes inherited by the generated an5 classes
- solver - the Intelligent Network Design solvers
- cases - the case analysis and repository to support case detection for use in providing search optimisation by solvers
- learner - statistics and correlation analysis to identify and put probabilities against network build cases
For the initial testing and validation the focus is on: compiler, model and solver, which form the MVP core of framework.
In creating the prototype solver the architecture has had to leverage a number of techniques and disciplines which I will outline in the Compiler and Solver package summaries.
Compiler
The compiler implementation is based on antlr. Antlr provides a mature Java based lex/yacc style compiler generator. Antlr generates a parse tree and supports tree walking call backs for symbol table build and code generation.
As per Decision #2 an5 is Java derived and so could be implemented quickly re-using (hacking ;-) ) an existing antlr grammar for Java (significantly reduced), rather than starting from scratch.
Solver
To implement any real world AI solution requires using a large number of techniques. The an5 prototype implementation - required assembling most of the basis AI building blocks as outlined in classic AI text "Artificial Intelligence by Patrick Henry Winston" [2] including:
- "Semantic Nets" for representation,
- "Generate-and-Test Method" for combinatorial generator. This initial proof has three generators,
- which use "The Problem Reduction Method" to take bigger network design problem and break it down into smaller set of sub-problems, which in turn are represented by AND/OR trees (see Decision #3)
- "Blind Methods", "Heuristically Informed Methods", "The Best Path", "Redundant Paths" search method are all used to help manage the combinatorial search space that is always lurching when there is need to search for potential solutions (formally NP-Complete problems).
When using an AI solver the generator may create sets of cases that are different from initial expectations. This was the case for the network design generation. To help understand the solver behaviour it was instrumented to collect run-time statistics as well has having skeleton "why" and "what" solution behaviour introspection.
Summary
The initial prototype only implements a portion of outlined architecture (compiler, model & solver).
While the current implementation is based on classic queue based Solver, it would be relatively easy to use AND/OR problem breakdown to distribute solver process across a distributed set of hosts to support solving "big problems"
The key question that needs to be addressed before implementing an "Intelligent Network Design" solution is to ensure that the problem is not intractable, due to combinatorial explosion.
To check this a paper based analysis and implementation testing was done, using a simple network design as outlined in the following "The Monkey and Banana - Test Case" and "Tractability - Combinatorial Analysis and Optimisation" sections.
The Monkey and Banana - Test Case
To test the an5 representation and solver I used a network variation of "the monkey and banana" problem as observed by Wolfgang Kohler, back in 1917 [15].
In the monkey and banana problem you must be able to solve the problem of "how can the monkey get the bananas?". The problem space has set of rooms with boxes, a monkey, a stick and a bunch of bananas that are too high to reach directly. To get the bananas the monkey needs to:
- move boxes under the banana
- stand on boxes with stick
- reach the bananas with the stick
The network equivalent as to connect 3 computers (boxes) to the network switch (bananas):
I selected this case as:
- it had the situation which is analogous to the "monkey and banana" case against which the original STRIPS[13] constraint based solver was tested,
- but with an additional layer of combinatorial generation that is needed in network domain
- dealt with simple physical and logical interface cases to allow test and development of this domain layer
- it was sufficiently non-trivial to allow testing against different search algorithms including Depth First, Breadth First, Branch and Bound, A*, British Museum (depending on search space size) and AND/OR tree based search
- it required problem reduction method to be applied
The test implementation provides the following search options (see Winston [2]) and domain optimisation choices:
public class an5SearchControl {
public static class SearchOptions { public static final int DEPTH = 0x001,
BREADTH = 0x002, BOUND = 0x004, SCORE = 0x008,
COST = 0x010, REMOVE_LOCAL_EQUIVALENTS = 0x020,
BIND_ALL = 0x040, BIND_UNIQUE = 0x080,
THREAD = 0x100, TIMEBOX = 0x200,
OPTIMAL = 0x400, KEEP_REMOVED = 0x800; }
....
}
Tractability - Combinatorial Analysis and Optimisation
The basic building blocks of AI problem solving using problem reduction and search are need for:
- Set of basic rules to allow problem reduction - to break bigger problem into set of smaller ones
- Generator - a combinatorial generator that should provide each of possible solutions for a given state of the problem solving and
- Search Engine - that controls the application of problem reduction, invokes generator and manages search for solutions.
The big enemy of AI search is combinatorial explosion, where the number of search alternates is to large to handle, making the problem intractable. To counter this the generator and search engine do not generate all solutions at once. Rather it generates a set of candidates and then tests and trims these so only a subset of the total number of search paths are expanded.
Even the simple test case outlined above would be intractable without apply some level of optimisation. Optimisations fall into a number of categories:
- Generic Search Control - applicable to any search space problem and operate by providing alternate search algorithms including: Depth First Search, Breadth First Search, Branch and Bound, Score & Cost based Search
- Domain Optimised - where the search optimisation if based on applying some pruning based on the problem domain of the search
- Case Based - where search is optimised based on historical or statistical analysis of paths which have most likelihood to provide preferred or right solution. These are ones which leverage machine learning techniques.
The other aspect of search based solutions is that with large search spaces you can not guarantee that you will find the optimal or best solution and so search is finished on finding the first solution or constrained to run for a fixed time.
In the case of above our "monkey and banana test" the set of combinations that need to be generated is equal to:
Combinations to generate = N(h) x N(s) x N(n) x N(p) x N(c)
where:
- H(h) - numbers hosts
- N(s) - number of PCI slots per host
- N(n) - number of NICs
- N(p) - number ports per NIC
- N(c) - number of cables
This is a lot of multipliers ... here what it looks like if we go from 1 host + nic -> 12 hosts + nics:
So even with the seemingly trivial "monkey + banana" network at 12 hosts the problem becomes intractable to comprehensive search.
In practice you can solve with 8 hosts but at 9 hosts memory allocation fails, due to needs of generator and at 10 hosts the size of combination space results in Java int (32 bit integer) overflow. At 7 & 8 hosts the solver spends all its time just generating the combinations, not searching them.
NOTE: you cannot have an array index of type long (64 bit integer), as it is not possible to allocate a block of memory this large.
Looking at the combinatorial behaviour shows that the largest number combination sets comes with "cable to NIC card" step, after which problem becomes trivial (direct connection of cable to switch) again:
As expected from the simple network topology the best search strategy in this case is just a "Depth First Search" as both best Scored and Cost based search end up selecting Depth First Path and so just add the overhead of calculation.
NOTE: That in general the network construction search cannot come up with an optimal scoring or cost algorithm as the estimate for "distance" to solution is never know prior to reaching a solution.
So without some other optimisation the network build is intractable.
In observing the behaviour it was apparent that the AI Solver was doing things that would not be required in real world. In particular the generator was doing a great job of building all the combination of ways that you could plug the various compatible interfaces together. So for 1 Computer it would generate every next solution combination set of:
C(h,n) = (h1-n1, h1-n2, h1-n3 ... h1-nX, h2-n1, h2-n2, h3-n3 ... h2-nX, .... hY-nX) where:
- C(h,n) - set of combination of hosts (h) and NICs (n) consisting of pairs:
- h1-n1 .. - host #1 with NIC #1, host #1 with NIC #2 etc
In additional it was also generating combinations of multiple NIC cards within single host (as each host was defined as having 4 slots).
In practice in the real world NICs and cables are treated as "commodity" devices and the result of assembling a path using any particular instance of an object of the same class of will result in the same outcome, irrespective of which instance I use.
In simple terms this means that if I have 10 Cat6 cables available to plug from a NIC port to a switch port then there is no advantage to generating all the 10 NIC port to cable to switch port combinations, as the result is same for each of the very large number of combinations. So instead you can just generate the first instance and make the cable un-available for further combinations.
To test this I created a "Local Equivalent Removal" option within the Domain solver layer. The algorithm was "Local" as it operated solely within scope of a given interface bind request and removed cases of binds which have class template equivalency.
If you do not prune the "Local Equivalents" the result is a huge number of essentially identical search trees. Avoidance and/or caching for reuse of these cases is described in Dechter[4] as part of AND/OR search analysis.
Practically the use of "Local Equivalent Removal" was to stop the combinatorial explosion in its tracks with each new depth step now having breadth of 1. So the Domain optimisation results in a reduction from 10 to power of 12 (=power(10,12) in Excel) to 1. Thats a better return than on BitCoin !
This is a huge reduction and illustrates how application of domain knowledge takes the problem from intractable to solvable in trivial time (less than a second in practice).
There is a lot of scope for application of Domain knowledge in this context as well as Case Based optimisation.
Importantly there is good reason to believe that solving network design automation using AI techniques will yield positive results.
Conclusion & Next Steps
The an5 language provides a way to specify both network building blocks and templates. The generated class library is then easily used to "write" a network solver. The simple solver definition below was used for the "Monkey and Banana" testing. This is now mostly a declarative exercise, using the an5 device library specification shown in Decision #2 (above):
/**
@what An example network create test file
@author John Hartley - Graphica Software/Dokmai Pty Ltd
*/
import java.util.ArrayList;
import java.util.List;
import com.fasterxml.jackson.core.JsonProcessingException;
import com.fasterxml.jackson.databind.JsonMappingException;
import com.fasterxml.jackson.databind.ObjectMapper;
import com.fasterxml.jackson.databind.SerializationFeature;
import com.fasterxml.jackson.databind.module.SimpleModule;
import an5.model.*;
import an5.solve.*;
import an5.generic.dctypes.*;
public class BuildMiniNetwork {
public static void main(String[] args){
/* lots of Jackson JSON Serialize/DeSerialize initialisation */
...
...
/* Switch */
String swdef = "{\"an5name\":\"switch\",\"name\":\"simple-switch\",\"reflects\":[{\"name\":\"port\",\"policy\":\"STATIC\",\"size\":24}]}";
/* 12 x Computer */
String[] cpdef = {"{\"an5name\":\"computer\",\"name\":\"hal-1\",\"reflects\":[{\"name\":\"slot\",\"policy\":\"STATIC\",\"size\":4}]}",
"{\"an5name\":\"computer\",\"name\":\"hal-2\",\"reflects\":[{\"name\":\"slot\",\"policy\":\"STATIC\",\"size\":4}]}",
...
...
"{\"an5name\":\"computer\",\"name\":\"hal-12\",\"reflects\":[{\"name\":\"slot\",\"policy\":\"STATIC\",\"size\":4}]}"};
/* 12 * NIC */
String[] nicdef = {"{\"an5name\":\"pcie_nic\",\"reflects\":[{\"name\":\"port\",\"policy\":\"STATIC\",\"size\":2}]}",
"{\"an5name\":\"pcie_nic\",\"reflects\":[{\"name\":\"port\",\"policy\":\"STATIC\",\"size\":2}]}",
...
...
"{\"an5name\":\"pcie_nic\",\"reflects\":[{\"name\":\"port\",\"policy\":\"STATIC\",\"size\":2}]}"};
/* 13 * Cat6 */
String[] cabdef = {"{\"an5name\":\"cat6_cable\",\"type\":\"physical\",\"length\":100};",
"{\"an5name\":\"cat6_cable\",\"type\":\"physical\",\"length\":100};",
...
...
"{\"an5name\":\"cat6_cable\",\"type\":\"physical\",\"length\":100};"};
AN5CL_switch sw1 = null;
AN5CL_computer cp[] = {null, null, null, null, null, null, null, null, null, null, null, null};
AN5CL_pcie_nic nic[] = {null, null, null, null, null, null, null, null, null, null, null, null};
AN5CL_cat6_cable cab[] = {null, null, null, null, null, null, null, null, null, null, null, null, null};
try {
sw1 = new AN5CL_switch(new ObjectMapper().readTree(swdef));
for (i = 0; i < cpdef.length; i++) {
cp[i] = objectMapper.readValue(cpdef[i], AN5CL_computer.class);
}
for (i = 0; i < nicdef.length; i++) {
nic[i] = objectMapper.readValue(nicdef[i], AN5CL_pcie_nic.class);
}
for (i = 0; i < cabdef.length; i++) {
cab[i] = objectMapper.readValue(cabdef[i], AN5CL_cat6_cable.class);
}
} catch (JsonMappingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
} catch (JsonProcessingException e1) {
// TODO Auto-generated catch block
e1.printStackTrace();
}
an5Object[] use = {sw1,
cp1, cp2, cp3, cp4, cp5, cp6,
cp7, cp8, cp9, cp10, cp11, cp12,
nic1, nic2, nic3, nic4, nic5, nic6,
nic7, nic8, nic9, nic10, nic11, nic12,
cab1, cab2, cab3, cab4, cab5, cab6,
cab7, cab8, cab9 , cab10, cab11, cab12, cab13};
List parts = new ArrayList<>();
for (an5Object ob : use) parts.add(ob);
AN5TP_ethernet_lan netPrototype = new AN5TP_ethernet_lan();
an5Network netResult = (an5Network)netPrototype.createInstance();
an5Template netTemplate = new an5CreateNetwork(null, netPrototype, parts, netResult);
an5SearchControl ctrl = new an5SearchControl();
int offFlags3 = an5SearchControl.SearchOptions.COST;
int onFlags1 = an5SearchControl.SearchOptions.DEPTH;
ctrl.turnOff(offFlags3);
ctrl.turnOn(onFlags1);
an5Goal makeNet = new an5Goal(netTemplate, ctrl);
int res = makeNet.solve();
System.out.println("Result was: " + ctrl.resultString(res));
ctrl.dumpStats(System.out);
for (an5GoalTree t: makeNet.found) {
an5FoundGoal resGoal = (an5FoundGoal)t;
if (resGoal.goal instanceof an5JoinNetwork) {
an5JoinNetwork jn = (an5JoinNetwork)(resGoal.goal);
an5Network no = jn.joinNet;
no.dumpJSON(System.out);
}
}
System.out.println("switch reports as" + sw1.toString());
}
}
NOTE: All classes with name AN5TP_ are generated templates and all classes with names AN5CL_ are generated classes. The classes with prefix "an5" are from the model or solver class library.
As the code is mostly declarative it could be readily used to provide a more complete "Intelligent Network Design" automation tool by building an interpretive interface on top of this to support a web UI.
The initial an5 implementation has successfully proven the concept of being able to provide an "Intelligent Network Design" using classic AI representation, combinatorial generation and AND/OR solver with flexible control.
Of particular importance is that it shows that applying a very simple and protocol neutral optimisation you can take the problem from intractable to solvable in near real time.
In practice this means that you it is viable to apply AI to classic fulfilment problems as part of re-thinking the approach to fulfilment.
Using features such as:
- Case Based Heuristic where you could constrain the search size to find solutions to larger design problems by using reference case sets
- Allowing solver to run for bounded time you could find multiple solutions, then select "best" and "preferred" solution and convert these to templates automatically by converting from solution instance to class based templates
- These class based template in turn can be feed into "Case Based Heuristics"
In this initial proof of concept only the "Create Network" and "Connect Element" solvers were used with an OR tree control framework.
The prototype implementation is around 7,000 lines of Java and Antlr code and the example "dc-types" specification generated code is around 500 lines of Java.
Next steps could include:
- Adding additional solvers
- Complete interface specification "regular expression" binding
- Add partial interface binding and service reflection to support layered protocols implementation
- Refine an5 language syntax to include interface specification details (currently these are just sets of name/value pair strings and not validated by compiler)
- Refine an5 language to support nested templates target network specifications (ie specify network outcome + connectivity constraint templates within single template class - initial update done with an5 V0.2 March 2023)
- Add full AND/OR solving support
- Add case based heuristic support
- Add automatic template generation for application in search optimisations. This would meet goal of being able to provide behaviour consistent with existing template / parameter (+ process) fulfilment systems. With solver based solution there would not be need for explicit process definition, as process would be based on dependency needs.
- Add JSON dump/load classes (using Jackson or similar library) (done Jan 22)
- Add distributed solver and control framework
- Provide UI based network builder leveraging interface / binding model to allow snap together designs based on an5 device libraries
- Test an5 with more network models and build up set of device and network model libraries
- Add interface to device layer so an5 can plug into lower level technologies such as YANG for configuration, JuJu for automated platform building and TOSCA based orchestration engines.
So prototype has proven the viability of the concept, but there is still lots to do. I have put the code up on github and believe that an5 is good candidate for Open Source development. So if there are other programming and network design savvy people who would like to help flesh out the framework, please muck in ;-) .
"Intelligent Network Design" is needed to help telco improve their speed to market and make significant impact in reducing cost for network design and fulfilment systems development.
References and Links:
[1] Principles of Artificial Intelligence - Nils J. Nilson – defined AND/OR tree problem representation
[2] Artificial Intelligence (3rd Ed) - Patrick Henry Winston - creating an5 and its solver required piecing together most of the parts (representation, generate & test, search) outlined in the first 5 chapters of this classic AI text. You can also see Patrick's lecture that cover the same topics on MIT OpenCourseWare videos
[3] Reasoning with Probabilistic and Deterministic Graphical Models - Exact Algorithms (2nd Edition) - Rina Dechter
[4] AND/OR Tree Search for Constraint Optimization - Radu Marineescu & Rina Dechter , paper on impact of introducing AND/OR into search
[5] Game-Tree Search Using Proof Numbers: The First Twenty Years - Akihiro Kishimoto, Mark H.M. Winands, Martin Muller & Jahn-Takeshi Saito
[6] Network Design Automation - Treating Networks like Programs and Chips -George Varghese - other people are thinking about this...
[7] SAT v CSP - Toby Walsh - paper provides review of semantics and mapping between Propositional Satisfiability (SAT) and Constraint Satisfaction Problems
[8] Juju - Canonical Infrastructure Automation
[9] Compilers - Principles, Techniques and Tools - Aho, Sethi & Ullman
[10] Modern Compiler Implementation in Java - Appel
[11] lex & yacc (2nd Ed)- John R Levin, Tony Mason & Doug Brown
[12] The Java Language Specification, Third Edition
[13] STRIPS: A New Approach to the Application of Theorem Proving to Problem Solving - Richard E. Fikes & Nils J. Nilsson - original paper on STRIPS (STanford Research Institute Problem Solver) constraint based solver
[14] Network Simulation Tools Survey - Mrs. Saba Siraj, Mr. Ajay Kumar Gupta & Mrs Rinku-Badgujar
[15] The Mentality of Apes - Wolfgang Kohler - the source of the "monkey and banana" problem and assumed source of the title picture.
[16] "Telco Inventory & Topology - Enabler for Automation" - provides a PoV on role of inventory in core Telco processes
[17] OASIS Tosca - Topology and Orchestration Specification for Cloud Applications
[18] Netconf/YANG (IETF RFC 6020, 7950 and others), and Cisco/Tail-f - YANG is a data definition language for network configuration. Cisco / Tail-f have leveraged this to provide an intent based orchestration offering
[19] Wolfgang Köhler’s the Mentality of Apes and the Animal Psychology of his
Time - Gabriel Ruiz - provide a review of Kohler's work
[20] Is CBR a Technology or a Methodology? - Ian Watson - provides a survey of "Case Based Reasoning" which uses case database to drive problem solving
[21] an5 - github respository
[22] an5 - presentation to Linux Foundation Networking / GSMA - Anuket Reference Model team
NOTE: The "monkey and banana" image can be found via google search "wolfgang kohler monkey photo", however I cannot find definitive paper/book from which this was taken.