Google has announced the general availability of AI Platform Prediction. The service is based on a Google Kubernetes Engine backend and enables developers to build, deploy, and manage machine learning models in the cloud.
The new backend architecture is claimed to be designed for improved reliability, more flexibility via new hardware options, reduced overhead latency, and improved tail latency.
In addition to standard features such as autoscaling, access logs, and request/response logging available during our Beta period, Google has introduced several updates to improve usability.
Many data scientists like the simplicity and power of XGBoost and scikit learn models for predictions in production. AI Platform makes it simple to deploy models trained using these frameworks with just a few clicks. Google promises to handle the complexity of the serving infrastructure on the hardware of your choice.
Another noteworthy feature is Resource Metrics. An important part of maintaining models in production is understanding their performance characteristics such as GPU, CPU, RAM, and network utilization. Resource metrics are now visible for models deployed on GCE machine types from Cloud Console and Stackdriver Metrics.
Google has also introduced new endpoints in three regions (us-central1, europe-west4, and asia-east1) with better regional isolation for improved reliability. Models deployed on the regional endpoints stay within the specified region.