Search result for `serverless`

[USENIX '24] Harmonizing Efficiency and Practicability: Optimizing Resource Utilization in Serverless Computing with Jiagu

  dblp   doi

Abstract

Current serverless platforms struggle to optimize resource utilization due to their dynamic and fine-grained nature. Conventional techniques like overcommitment and autoscaling fall short, often sacrificing utilization for practicability or incurring performance trade-offs. Overcommitment requires predicting performance to prevent QoS violation, introducing trade-off between prediction accuracy and overheads. Autoscaling requires scaling instances in response to load fluctuations quickly to reduce resource wastage, but more frequent scaling also leads to more cold start overheads. This paper introduces Jiagu to harmonize efficiency with practicability through two novel techniques. First, pre-decision scheduling achieves accurate prediction while eliminating overheads by decoupling prediction and scheduling. Second, \emph{dual-staged scaling} achieves frequent adjustment of instances with minimum overhead. We have implemented a prototype and evaluated it using real-world applications and traces from the public cloud platform. Our evaluation shows a 54.8% improvement in deployment density over commercial clouds (with Kubernetes) while maintaining QoS, and 81.0%–93.7% lower scheduling costs and a 57.4%–69.3% reduction in cold start latency compared to existing QoS-aware schedulers.

[USENIX '24] ALPS: An Adaptive Learning, Priority OS Scheduler for Serverless Functions

  dblp   doi

Abstract

FaaS (Function-as-a-Service) workloads feature unique patterns. Serverless functions are ephemeral, highly concurrent, and bursty, with an execution duration ranging from a few milliseconds to a few seconds. The workload behaviors pose new challenges to kernel scheduling. Linux CFS (Completely Fair Scheduler) is workload-oblivious and optimizes long-term fairness via proportional sharing. CFS neglects the short-term demands of CPU time from short-lived serverless functions, severely impacting the performance of short functions. Preemptive shortest job first—shortest remaining process time (SRPT)—prioritizes shorter functions in order to satisfy their short-term demands of CPU time and, therefore, serves as a best-case baseline for optimizing the turnaround time of short functions. A significant downside of approximating SRPT, however, is that longer functions might be starved. In this paper, we propose a novel application-aware kernel scheduler, ALPS (Adaptive Learning, Priority Scheduler), based on two key insights. First, approximating SRPT can largely benefit short functions but may inevitably penalize long functions. Second, CFS provides necessary infrastructure support to implement user-defined priority scheduling. To this end, we design ALPS to have a novel, decoupled scheduler frontend and backend architecture, which unifies approximate SRPT and proportional-share scheduling. ALPS’ frontend sits in the user space and approximates SRPT-inspired priority scheduling by adaptively learning from an SRPT simulation on a recent past workload. ALPS’ backend uses eBPF functions hooked to CFS to carry out the continuously learned policies sent from the frontend to inform scheduling decisions in the kernel. This design adds workload intelligence to workload-oblivious OS scheduling while retaining the desirable properties of OS schedulers. We evaluate ALPS extensively using two production FaaS workloads (Huawei and Azure), and results show that ALPS achieves a reduction of 57.2% in average function execution duration compared to CFS.

[USENIX '24] StreamBox: A Lightweight GPU SandBox for Serverless Inference Workflow

  dblp   doi

Abstract

The dynamic workload and latency sensitivity of DNN inference drive a trend toward exploiting serverless computing for scalable DNN inference serving. Usually, GPUs are spatially partitioned to serve multiple co-located functions. However, existing serverless inference systems isolate functions in separate monolithic GPU runtimes (e.g., CUDA context), which is too heavy for short-lived and fine-grained functions, leading to a high startup latency, a large memory footprint, and expensive inter-function communication. In this paper, we present StreamBox, a new lightweight GPU sandbox for serverless inference workflow. StreamBox unleashes the potential of streams and efficiently realizes them for serverless inference by implementing fine-grain and auto-scaling memory management, allowing transparent and efficient intra-GPU communication across functions, and enabling PCIe bandwidth sharing among concurrent streams. Our evaluations over real-world workloads show that StreamBox reduces the GPU memory footprint by up to 82% and improves throughput by 6.7X compared to state-of-the-art serverless inference systems.

[USENIX '24] A Secure, Fast, and Resource-Efficient Serverless Platform with Function REWIND

  dblp   doi

Abstract

Serverless computing often utilizes the warm container technique to improve response times. However, this method, which allows the reuse of function containers across different function requests of the same type, creates persistent vulnerabilities in memory and file systems. These vulnerabilities can lead to security breaches such as data leaks. Traditional approaches to address these issues often suffer from performance drawbacks and high memory requirements due to extensive use of user-level snapshots and complex restoration processes. The paper introduces REWIND, an innovative and efficient serverless function execution platform designed to address these security and efficiency concerns. REWIND ensures that after each function request, the container is reset to an initial state, free from any sensitive data, including a thorough restoration of the file system to prevent data leakage. It incorporates a kernel-level memory snapshot management system, which significantly lowers memory usage and accelerates the rewind process. Additionally, REWIND optimizes runtime by reusing memory regions and leveraging the temporal locality of function executions, enhancing performance while maintaining strict data isolation between requests. The REWIND prototype is implemented on OpenWhisk and Linux and evaluated with serverless benchmark workloads. The evaluation results have demonstrated that REWIND provides substantial memory saving while providing high function execution performance. Especially, the low memory usage makes more warm containers kept alive thereby improving the throughput as well as the latency of function execution while providing isolation between function requests.

[SIGCOMM '24] YuanRong: A Production General-purpose Serverless System for Distributed Applications in the Cloud

  dblp   doi

Abstract

We design, implement, and evaluate YuanRong, the first production general-purpose serverless platform with a unified programming interface, multi-language runtime, and a distributed computing kernel for cloud-based applications. YuanRong addresses many limitations of existing Function-as-a-Service (FaaS) systems, particularly in performance and lack of important features. First, our fast function system supports sub-millisecond function invocation and locality-aware hierarchical scheduling. Second, our multi-semantic built-in data system achieves object exchange latency of 200 microseconds, enabling end-to-end latency of 2 milliseconds for streaming elements at 5Gbps throughput. Third, the extensible, portable Service Bridge bridges stateless and stateful operations, allowing connection reuse and distributed transactions, and offers unified backend abstraction for multi-cloud portability. YuanRong has been deployed for over 3 years at Huawei across nearly 20 datacenter regions, processing up to 30 billion requests per day on more than 100,000 CPU cores, with a daily average CPU usage of 53%. It serves various serverless workloads, including enhanced FaaS services, microservices, data analytics, deep model training and serving, and some HPC workloads. Our experience shows that Spring-based microservices can be migrated to YuanRong within one day, reducing resource costs by 90%, demonstrating its generality and efficiency in supporting a broad spectrum of applications.

[OSDI '24] Sabre: Hardware-Accelerated Snapshot Compression for Serverless MicroVMs

  dblp   doi

Abstract

MicroVM snapshotting significantly reduces the cold start overheads in serverless applications. Snapshotting enables storing part of the physical memory of a microVM guest into a file, and later restoring from it to avoid long cold start-up times. Prefetching memory pages from snapshots can further improve the effectiveness of snapshotting. However, the efficacy of prefetching depends on the size of the memory that needs to be restored. Lossless page compression is therefore a great way to improve the coverage of the memory footprint that snapshotting with prefetching achieves. Unfortunately, the high overhead and high CPU cost of software-based (de)compression makes this impractical. We introduce Sabre, a novel approach to snapshot page prefetching based on hardware-accelerated (de)compression. Sabre leverages an increasingly pervasive near-memory analytics accelerator available in modern datacenter processors. We show that by appropriately leveraging such accelerators, microVM snapshots of serverless applications can be compressed up to a factor of 4.5×, with nearly negligible decompression costs. We use this insight to build an efficient page prefetching library capable of speeding up memory restoration from snapshots by up to 55%. We integrate the library with the production-grade Firecracker microVMs and evaluate its end-to-end performance on a wide set of serverless applications.

[OSDI '24] ServerlessLLM: Low-Latency Serverless Inference for Large Language Models

  dblp   doi

Abstract

This paper presents ServerlessLLM, a distributed system designed to support low-latency serverless inference for Large Language Models (LLMs). By harnessing the substantial near-GPU storage and memory capacities of inference servers, ServerlessLLM achieves effective local checkpoint storage, minimizing the need for remote checkpoint downloads and ensuring efficient checkpoint loading. The design of ServerlessLLM features three core contributions: (i) fast multi-tier checkpoint loading, featuring a new loading-optimized checkpoint format and a multi-tier loading system, fully utilizing the bandwidth of complex storage hierarchies on GPU servers; (ii) efficient live migration of LLM inference, which enables newly initiated inferences to capitalize on local checkpoint storage while ensuring minimal user interruption; and (iii) startup-time-optimized model scheduling, which assesses the locality statuses of checkpoints on each server and schedules the model onto servers that minimize the time to start the inference. Comprehensive evaluations, including microbenchmarks and real-world scenarios, demonstrate that ServerlessLLM dramatically outperforms state-of-the-art serverless systems, reducing latency by 10 - 200X across various LLM inference workloads.

[NSDI '24] Jolteon: Unleashing the Promise of Serverless for Serverless Workflows

  dblp   doi

Abstract

Serverless computing promises automatic resource provisioning to relieve the burden of developers. Yet, developers still have to manually configure resources on current serverless platforms to satisfy application-level requirements. This is because cloud applications are orchestrated as serverless workflows with multiple stages, exhibiting a complex relationship between resource configuration and application requirements. We propose Jolteon, an orchestrator to unleash the promise of automatic resource provisioning for serverless workflows. At the core of Jolteon is a stochastic performance model that combines the benefits of whitebox modeling to capture the execution characteristics of serverless computing and blackbox modeling to accommodate the inherent performance variability. We formulate a chance constrained optimization problem based on the performance model, and exploit sampling and convexity to find optimal resource configurations that satisfy user-defined cost or latency bounds. We implement a system prototype of Jolteon and evaluate it on AWS Lambda with a variety of serverless workflows. The experimental results show that Jolteon outperforms the state-of-the-art solution, Orion, by up to 2.3× on cost and 2.1× on latency.

[ISCA '24] EcoFaaS: Rethinking the Design of Serverless Environments for Energy Efficiency

  dblp   doi

Abstract

While serverless computing is increasingly popular, its energy and power consumption behavior is hardly explored. In this work, we perform a thorough characterization of the serverless environment and observe that it poses a set of challenges not effectively handled by existing energy-management schemes. Short serverless functions execute in opaque virtualized sandboxes, are idle for a large fraction of their invocation time, context switch frequently, and are co-located in a highly dynamic manner with many other functions of diverse properties. These features are a radical shift from more traditional application environments and require a new approach to manage energy and power. Driven by these insights, we design EcoFaaS, the first energy management framework for serverless environments. EcoFaaS takes a user-provided end-to-end application Service Level Objective (SLO). It then splits the SLO into per-function deadlines that minimize the total energy consumption. Based on the computed deadlines, EcoFaaS sets the optimal per-invocation core frequency using a prediction algorithm. The algorithm performs a fine-grained analysis of the execution time of each invocation, while taking into account the specific invocation inputs. To maximize efficiency, EcoFaaS splits the cores in a server into multiple Core Pools, where all the cores in a pool run at the same frequency and are controlled by a single scheduler. EcoFaaS dynamically changes the sizes and frequencies of the pools based on the current system state. We implement EcoFaaS on two open-source serverless platforms (OpenWhisk and KNative) and evaluate it using diverse serverless applications. Compared to state-of-the-art energy-management systems, EcoFaaS reduces the total energy consumption of serverless clusters by $42 \%$ while simultaneously reducing the tail latency by $34.8 \%$.

[INFOCOM '24] Demeter: Fine-grained Function Orchestration for Geo-distributed Serverless Analytics

  dblp   doi

Abstract

In the era of global services, low-latency analytics on large-volume geo-distributed data has been a regular demand for application decision-making. Serverless computing facilitates fast function start-up and deployment, making it an attractive way for geo-distributed analytics. We argue that the serverless paradigm holds the potential to breach current performance bottlenecks via fine-grained function orchestration. However, how to configure it for geo-distributed analytics remains ambiguous. To fill this gap, we present Demeter, a scalable fine-grained function orchestrator for geo-distributed serverless analytics systems. Demeter aims to minimize the composite cost of co-existing jobs while meeting the user-specific Service Level Objectives (SLO). To handle the volatile environments and learn the diverse function demands, a Multi-Agent Reinforcement Learning (MARL) solution is used to co-optimize the per-function placement and resource allocation. The MARL extracts holistic and compact states via hierarchical graph neural networks, and then designs a novel actor network to shrink the huge decision space and model complexity. Finally, we implement Demeter and evaluate it using realistic workloads. The experimental results reveal that Demeter significantly saves costs by 23.3%∼32.7%, while reducing SLO violations by over 27.4%, surpassing state-of-the-art solutions.

[FAST '24] MinFlow: High-performance and Cost-efficient Data Passing for I/O-intensive Stateful Serverless Analytics

  dblp   doi

Abstract

Serverless computing has revolutionized application deployment, obviating traditional infrastructure management and dynamically allocating resources on demand. A significant use case is I/O-intensive applications like data analytics, which widely employ the pivotal "shuffle" operation. Unfortunately, the shuffle operation poses severe challenges due to the massive PUT/GET requests to remote storage, especially in high-parallelism scenarios, leading to high performance degradation and storage cost. Existing designs optimize the data passing performance from multiple aspects, while they operate in an isolated way, thus still introducing unforeseen performance bottlenecks and bypassing untapped optimization opportunities. In this paper, we develop MinFlow, a holistic data passing framework for I/O-intensive serverless analytics jobs. MinFlow first rapidly generates numerous feasible multi-level data passing topologies with much fewer PUT/GET operations, then it leverages an interleaved partitioning strategy to divide the topology DAG into small-size bipartite sub-graphs to optimize function scheduling, further reducing over half of the transmitted data to remote storage. Moreover, MinFlow also develops a precise model to determine the optimal configuration, thus minimizing data passing time under practical function deployments. We implement a prototype of MinFlow, and extensive experiments show that MinFlow significantly outperforms state-of-the-art systems, FaaSFlow and Lambada, in both the job completion time and storage cost.

[EUROSYS '24] Serialization/Deserialization-free State Transfer in Serverless Workflows

  dblp   doi

Abstract

Serialization and deserialization play a dominant role in the state transfer time of serverless workflows, leading to substantial performance penalties during workflow execution. We identify the key reason as a lack of ability to efficiently access the (remote) memory of another function. We propose RMMap, an OS primitive for remote memory map. It allows a serverless function to directly access the memory of another function, even if it is located remotely. RMMap is the first to completely eliminates serialization and deserialization when transferring states between any pairs of functions in (unmodified) serverless workflows. To make remote memory map efficient and feasible, we co-design it with fast networking (RDMA), OS, language runtime, and serverless platform. Evaluations using real-world serverless workloads show that integrating RMMap with Knative reduces the serverless workflow execution time on Knative by up to 2.6 × and improves resource utilizations by 86.3%.

[EUROSYS '24] Pronghorn: Effective Checkpoint Orchestration for Serverless Hot-Starts

  dblp   doi

Abstract

Serverless computing allows developers to deploy and scale stateless functions in ephemeral workers easily. As a result, serverless computing has been widely used for many applications, such as computer vision, video processing, and HTML generation. However, we find that the stateless nature of serverless computing wastes many of the important benefits modern language runtimes have to offer. A notable example is the extensive profiling and Just-in-Time (JIT) compilation effort that runtimes implement to achieve acceptable performance of popular high-level languages, such as Java, JavaScript, and Python. Unfortunately, when modern language runtimes are naively adopted in serverless computing, all of these efforts are lost upon worker eviction. Checkpoint-restore methods alleviate the problem by resuming workers from snapshots taken after initialization. However, production-grade language runtimes can take up to thousands of invocations to fully optimize a single function, thus rendering naive checkpoint-restore policies ineffective.This paper proposes Pronghorn, a snapshot serverless orchestrator that automatically monitors the function performance and decides (1) when it is the right moment to take a snapshot and (2) which snapshot to use for new workers. Pronghorn is agnostic to the underlying platform and JIT runtime, thus easing its integration into existing runtimes and worker deployment environments (container, virtual machine, etc.). On a set of representative serverless benchmarks, Pronghorn provides end-to-end median latency improvements of 37.2% across 9 out of 13 benchmarks (20-58% latency reduction) when compared to state-of-art checkpointing policies.

[EUROSYS '24] Optimus: Warming Serverless ML Inference via Inter-Function Model Transformation

  dblp   doi

Abstract

Serverless ML inference is an emerging cloud computing paradigm for low-cost, easy-to-manage inference services. In serverless ML inference, each call is executed in a container; however, the cold start of containers results in long inference delays. Unfortunately, most existing works do not work well because they still need to load models into containers from scratch, which is the bottleneck based on our observations. Therefore, this paper proposes a low-latency serverless ML inference system called Optimus via a new container management scheme. Our key insight is that the loading of a new model can be significantly accelerated when using an existing model with a similar structure in a warm but idle container. We thus develop a novel idea of inter-function model transformation for serverless ML inference, which delves into models within containers at a finer granularity of operations, designs a set of in-container meta-operators for both CNN and transformer model transformation, and develops an efficient scheduling algorithm with linear complexity for a low-cost transformation strategy. Our evaluations on thousands of models show that Optimus reduces inference latency by 24.00% ~ 47.56% in both simulated and real-world workloads compared to state-of-the-art work.

[ASPLOS '24] CodeCrunch: Improving Serverless Performance via Function Compression and Cost-Aware Warmup Location Optimization

  dblp   doi

Abstract

Serverless computing has a critical problem of function cold starts. To minimize cold starts, state-of-the-art techniques predict function invocation times to warm them up. Warmed-up functions occupy space in memory and incur a keep-alive cost, which can become exceedingly prohibitive under bursty load. To address this issue, we design CodeCrunch, which introduces the concept of serverless function compression and exploits server heterogeneity to make serverless computing more efficient, especially under high memory pressure.

[ASPLOS '24] RainbowCake: Mitigating Cold-starts in Serverless with Layer-wise Container Caching and Sharing

  dblp   doi

Abstract

Serverless computing has grown rapidly as a new cloud computing paradigm that promises ease-of-management, cost-efficiency, and auto-scaling by shipping functions via self-contained virtualized containers. Unfortunately, serverless computing suffers from severe cold-start problems---starting containers incurs non-trivial latency. Full container caching is widely applied to mitigate cold-starts, yet has recently been outperformed by two lines of research: partial container caching and container sharing. However, either partial container caching or container sharing techniques exhibit their drawbacks. Partial container caching effectively deals with burstiness while leaving cold-start mitigation halfway; container sharing reduces cold-starts by enabling containers to serve multiple functions while suffering from excessive memory waste due to over-packed containers.This paper proposes RainbowCake, a layer-wise container pre-warming and keep-alive technique that effectively mitigates cold-starts with sharing awareness at minimal waste of memory. With structured container layers and sharing-aware modeling, RainbowCake is robust and tolerant to invocation bursts. We seize the opportunity of container sharing behind the startup process of standard container techniques. RainbowCake breaks the container startup process of a container into three stages and manages different container layers individually. We develop a sharing-aware algorithm that makes event-driven layer-wise caching decisions in real-time. Experiments on OpenWhisk clusters with real-world workloads show that RainbowCake reduces 68% function startup latency and 77% memory waste compared to state-of-the-art solutions.

[ASPLOS '24] FaaSGraph: Enabling Scalable, Efficient, and Cost-Effective Graph Processing with Serverless Computing

  dblp   doi

Abstract

Graph processing is widely used in cloud services; however, current frameworks face challenges in efficiency and cost-effectiveness when deployed under the Infrastructure-as-a-Service model due to its limited elasticity. In this paper, we present FaaSGraph, a serverless-native graph computing scheme that enables efficient and economical graph processing through the co-design of graph processing frameworks and serverless computing systems. Specifically, we design a data-centric serverless execution model to efficiently power heavy computing tasks. Furthermore, we carefully design a graph processing paradigm to seamlessly cooperate with the data-centric model. Our experiments show that FaaS-Graph improves end-to-end performance by up to 8.3X and reduces memory usage by up to 52.4% compared to state-of-the-art IaaS-based methods. Moreover, FaaSGraph delivers steady 99%-ile performance in highly fluctuated workloads and reduces monetary cost by 85.7%.

[ASPLOS '24] In-Storage Domain-Specific Acceleration for Serverless Computing

  dblp   doi

Abstract

The authors have requested minor, non-substantive changes to the VoR and, in accordance with ACM policies, a Corrected VoR was published on July 10, 2024. For reference purposes the VoR may still be accessed via the Supplemental Material section on this page.While (I) serverless computing is emerging as a popular form of cloud execution, datacenters are going through major changes: (II) storage dissaggregation in the system infrastructure level and (III) integration of domain-specific accelerators in the hardware level. Each of these three trends individually provide significant benefits; however, when combined the benefits diminish. On the convergence of these trends, the paper makes the observation that for serverless functions, the overhead of accessing dissaggregated storage overshadows the gains from accelerators. Therefore, to benefit from all these trends in conjunction, we propose In-Storage Domain-Specific Acceleration for Serverless Computing (dubbed DSCS-Serverless1). The idea contributes a server-less model that utilizes a programmable accelerator embedded within computational storage to unlock the potential of acceleration in disaggregated datacenters. Our results with eight applications show that integrating a comparatively small accelerator within the storage (DSCS-Serverless) that fits within the storage's power constraints (25 Watts), significantly outperforms a traditional disaggregated system that utilizes NVIDIA RTX 2080 Ti GPU (250 Watts). Further, the work highlights that disaggregation, serverless model, and the limited power budget for computation in storage device require a different design than the conventional practices of integrating microprocessors and FPGAs. This insight is in contrast with current practices of designing computational storage devices that are yet to address the challenges associated with the shifts in datacenters. In comparison with two such conventional designs that use ARM cores or a Xilinx FPGA, DSCS-Serverless provides 3.7× and 1.7× end-to-end application speedup, 4.3× and 1.9× energy reduction, and 3.2× and 2.3× better cost efficiency, respectively.

[ASPLOS '24] FaaSMem: Improving Memory Efficiency of Serverless Computing with Memory Pool Architecture

  dblp   doi

Abstract

In serverless computing, an idle container is not recycled directly, in order to mitigate time-consuming cold container startup. These idle containers still occupy the memory, exasperating the memory shortage of today's data centers. By offloading their cold memory to remote memory pool could potentially resolve this problem. However, existing offloading policies either hurt the Quality of Service (QoS) or are too coarse-grained in serverless computing scenarios.We therefore propose FaaSMem, a dedicated memory offloading mechanism tailored for serverless computing with memory poor architecture. It is proposed based on our finding that the memory of a serverless container allocated in different stages has different usage patterns. Specifically, FaaSMem proposes Page Bucket (Pucket) to segregate the memory pages in different segments, and applies segment-wise offloading policies for them. FaaSMem also proposes a semi-warm period during keep-alive stage, to seek a sweet spot between the offloading effort and the remote access penalty. Experimental results show that FaaSMem reduces the average local memory footprint by 9.9% - 79.8% and improves the container deployment density to 108% - 218%, with negligible 95%-ile latency increase.

[ASPLOS '24] FUYAO: DPU-enabled Direct Data Transfer for Serverless Computing

  dblp   doi

Abstract

Serverless computing typically relies on the third-party forwarding method to transmit data between functions. This method couples control flow and data flow together, resulting in significantly slow data transmission speeds. This challenge makes it difficult for the serverless computing paradigm to meet the low-latency requirements of web services.To solve this problem, we propose decoupling the control flow from the data flow, enabling direct data transfer between functions. We introduce Fuyao, the first intermediate data transfer solution capable of reducing data transfer latency to the sub-millisecond level. Fuyao provides four different data transfer methods to cater to diverse data transfer requirements within or between nodes. For function pairs that communicate frequently, Fuyao builds a stateful direct connection between them, enabling rapid inter-function data exchange. We evaluate Fuyao using real-world representative benchmarks. Experimental results show that Fuyao outperforms state-of-the-art systems by up to 57× on latency.

[USENIX '23] Sponge: Fast Reactive Scaling for Stream Processing with Serverless Frameworks

  dblp   doi

Abstract

Streaming workloads deal with data that is generated in real-time. This data is often unpredictable and changes rapidly in volume. To deal with these fluctuations, current systems aim to dynamically scale in and out, redistribute, and migrate computing tasks across a cluster of machines. While many prior works have focused on reducing the overhead of system reconfiguration and state migration on pre-allocated cluster resources, these approaches still face significant challenges in meeting latency SLOs at low operational costs, especially upon facing unpredictable bursty loads. In this paper, we propose Sponge, a new stream processing system that enables fast reactive scaling of long-running stream queries by leveraging serverless framework (SF) instances. Sponge absorbs sudden, unpredictable increases in input loads from existing VMs with low latency and cost by taking advantage of the fact that SF instances can be initiated quickly, in just a few hundred milliseconds. Sponge efficiently tracks a small number of metrics to quickly detect bursty loads and make fast scaling decisions based on these metrics. Moreover, by incorporating optimization logic at compile-time and triggering fast data redirection and partial-state merging mechanisms at runtime, Sponge avoids optimization and state migration overheads during runtime while efficiently offloading bursty loads from existing VMs to new SF instances. Our evaluation on AWS EC2 and Lambda using the NEXMark benchmark shows that Sponge promptly reacts to bursty input loads, reducing 99th-percentile tail latencies by 88% on average compared to other stream query scaling methods on VMs. Sponge also reduces cost by 83% compared to methods that over-provision VMs to handle unpredictable bursty loads.

[USENIX '23] Decentralized and Stateful Serverless Computing on the Internet Computer Blockchain

  dblp   doi

Abstract

The Internet Computer (IC) is a fast and efficient decentralized blockchain-based platform for the execution of general-purpose applications in the form of smart contracts. In other words, the IC service is the antithesis of current serverless computing. Instead of ephemeral, stateless functions operated by a single entity, the IC offers decentralized stateful serverless computation over untrusted, independent datacenters. Developers deploy stateful canisters that serve calls either to end-users or other canisters. The IC programming model is similar to serverless clouds, with applications written in modern languages such as Rust or Python, yet simpler: state is maintained automatically, without developer intervention.In this paper, we identify and address significant systems challenges to enable efficient decentralized stateful serverless computation: scalability, stateful execution through orthogonal persistence, and deterministic scheduling. We describe the design of the IC and characterize its operational data gathered over the past 1.5 years, and its performance.

[USENIX '23] UnFaaSener: Latency and Cost Aware Offloading of Functions from Serverless Platforms

  dblp   doi

Abstract

We present UnFaaSener, a lightweight framework that enables serverless users to reduce their bills by harvesting non-serverless compute resources such as their VMs, on-premise servers, or personal computers. UnFaaSener is not a new serverless platform, nor does it require any support from today's production serverless platforms. It uses existing pub/sub services as the glue between the serverless application and offloading hosts. UnFaaSener's asynchronous scheduler takes into consideration the projected resource availability of the offloading hosts, various latency and cost components of serverless versus offloaded execution, the structure of the serverless application, and the developer's QoS expectations to find the most optimal offloading decisions. These decisions are then stored to be retrieved and propagated through the execution flow of the serverless application. The system supports partial offloading at the resolution of each function and utilizes several design choices to establish confidence and adaptiveness. We evaluate the effectiveness of UnFaaSener for serverless applications with various structures. UnFaaSener was able to deliver cost savings of up to 89.8% based on the invocation pattern and the structure of the application, when we limited the offloading cap to 90% in our experiments.

[SOSP '23] XFaaS: Hyperscale and Low Cost Serverless Functions at Meta

  dblp   doi

Abstract

Function-as-a-Service (FaaS) has become a popular programming paradigm in Serverless Computing. As the responsibility of resource provisioning shifts from users to cloud providers, the ease of use of FaaS for users may come at the expense of extra hardware costs for cloud providers. Currently, there is no report on how FaaS platforms address this challenge and the level of hardware utilization they achieve.This paper presents the FaaS platform called XFaaS in Meta's hyperscale private cloud. XFaaS currently processes trillions of function calls per day on more than 100,000 servers. We describe a set of optimizations that help XFaaS achieve a daily average CPU utilization of 66%. Based on our anecdotal knowledge, this level of utilization might be several times higher than that of typical FaaS platforms.Specifically, to eliminate the cold start time of functions, XFaaS strives to approximate the effect that every worker can execute every function immediately. To handle load spikes without over-provisioning resources, XFaaS defers the execution of delay-tolerant functions to off-peak hours and globally dispatches function calls across datacenter regions. To prevent functions from overloading downstream services, XFaaS uses a TCP-like congestion-control mechanism to pace the execution of functions.

[SOSP '23] Halfmoon: Log-Optimal Fault-Tolerant Stateful Serverless Computing

  dblp   doi

Abstract

Serverless computing separates function execution from state management. Simple retry-based fault tolerance might corrupt the shared state with duplicate updates. Existing solutions employ log-based fault tolerance to achieve exactlyonce semantics, where every single read or write to the external state is associated with a log for deterministic replay. However, logging is not a free lunch, which introduces considerable overhead to stateful serverless applications.We present Halfmoon, a serverless runtime system for fault-tolerant stateful serverless computing. Our key insight is that it is unnecessary to symmetrically log both reads and writes. Instead, it suffices to log either reads or writes, i.e., asymmetrically. We design two logging protocols that enforce exactly-once semantics while providing log-free reads and writes, which are suitable for read- and write-intensive workloads, respectively. We theoretically prove that the two protocols are log-optimal, i.e., no other protocols can achieve lower logging overhead than our protocols. We provide a criterion for choosing the right protocol for a given workload, and a pauseless switching mechanism to switch protocols for dynamic workloads. We implement a prototype of Halfmoon. Experiments show that Halfmoon achieves 20%--40% lower latency and 1.5--4.0× lower logging overhead than the state-of-the-art solution Boki.

[SIGCOMM '23] Ditto: Efficient Serverless Analytics with Elastic Parallelism

  dblp   doi

Abstract

Serverless computing provides fine-grained resource elasticity for data analytics---a job can flexibly scale its resources for each stage, instead of sticking to a fixed pool of resources throughout its lifetime. Due to different data dependencies and different shuffling overheads caused by intra- and inter-server communication, the best degree of parallelism (DoP) for each stage varies based on runtime conditions.We present Ditto, a job scheduler for serverless analytics that leverages fine-grained resource elasticity to optimize for job completion time (JCT) and cost. The key idea of Ditto is to use a new scheduling granularity---stage group---to decouple parallelism configuration from function placement. Ditto bundles stages into stage groups based on their data dependencies and IO characteristics. It exploits the parallelized time characteristics of the stages to determine the parallelism configuration, and prioritizes the placement of stage groups with large shuffling traffic, so that the stages in these groups can leverage zero-copy intra-server communication for efficient shuffling. We build a system prototype of Ditto and evaluate it with a variety of benchmarking workloads. Experimental results show that Ditto outperforms existing solutions by up to 2.5× on JCT and up to 1.8× on cost.

[SC '23] Rethinking Deployment for Serverless Functions: A Performance-First Perspective

  dblp   doi

Abstract

Serverless computing commonly adopts strong isolation mechanisms for deploying functions, which may bring significant performance overhead because each function needs to run in a completely new environment (i.e., the "one-to-one" model). To accelerate the function computation, prior work has proposed using sandbox sharing to reduce the overhead, i.e., the "many-to-one" model. Nonetheless, either process-based true parallelism or thread-based pseudo-parallelism still causes high latency, preventing its adaptation for latency-sensitive web services.To achieve optimal performance and resource efficiency for serverless workflow, we argue an "m-to-n" deployment model that manipulates multiple granularities of computing abstractions such as processes, threads, and sandboxes to amortize overhead. We propose wrap, a new deployment abstraction that balances the tradeoffs between interaction overhead, startup overhead and function execution. We further design Chiron, a wrap-based deployment manager that can automatically perform the orchestration of multiple computing abstractions based on performance prioritization. Our comprehensive evaluation indicates that Chiron outperforms state-of-the-art systems by 1.3×-21.8× on system throughput.

[OSDI '23] No Provisioned Concurrency: Fast RDMA-codesigned Remote Fork for Serverless Computing

  dblp   doi

Abstract

Serverless platforms essentially face a tradeoff between container startup time and provisioned concurrency (i.e., cached instances), which is further exaggerated by the frequent need for remote container initialization. This paper presents MITOSIS, an operating system primitive that provides fast remote fork, which exploits a deep codesign of the OS kernel with RDMA. By leveraging the fast remote read capability of RDMA and partial state transfer across serverless containers, MITOSIS bridges the performance gap between local and remote container initialization. MITOSIS is the first to fork over 10,000 new containers from one instance across multiple machines within a second, while allowing the new containers to efficiently transfer the pre-materialized states of the forked one. We have implemented MITOSIS on Linux and integrated it with FN, a popular serverless platform. Under load spikes in real-world serverless workloads, MITOSIS reduces the function tail latency by 89% with orders of magnitude lower memory usage. For serverless workflow that requires state transfer, MITOSIS improves its execution time by 86%.

[OSDI '23] Automated Verification of Idempotence for Stateful Serverless Applications

  dblp   doi

Abstract

Serverless computing has become a popular cloud computing paradigm. By default, when a serverless function fails, the serverless platform re-executes the function to tolerate the failure. However, such a retry-based approach requires functions to be idempotent, which means that functions should expose the same behavior regardless of retries. This requirement is challenging for developers, especially when functions are stateful. Failures may cause functions to repeatedly read and update shared states, potentially corrupting data consistency. This paper presents Flux, the first toolkit that automatically verifies the idempotence of serverless applications. It proposes a new correctness definition, idempotence consistency, which stipulates that a serverless function’s retry is transparent to users. To verify idempotence consistency, Flux defines a novel property, idempotence simulation, which decomposes the proof for a concurrent serverless application into the reasoning of individual functions. Furthermore, Flux extends existing verification techniques to realize automated reasoning, enabling Flux to identify idempotence-violating operations and fix them with existing log-based methods. We demonstrate the efficacy of Flux with 27 representative serverless applications. Flux has successfully identified previously unknown issues in 12 applications. Developers have confirmed 8 issues. Compared to state-of-the-art systems (namely Beldi and Boki) that log every operation, Flux achieves up to 6× lower latency and 10× higher peak throughput, as it logs only the identified idempotence-violating ones.

[NSDI '23] Following the Data, Not the Function: Rethinking Function Orchestration in Serverless Computing

  dblp   doi

Abstract

Serverless applications are typically composed of function workflows in which multiple short-lived functions are triggered to exchange data in response to events or state changes. Current serverless platforms coordinate and trigger functions by following high-level invocation dependencies but are oblivious to the underlying data exchanges between functions. This design is neither efficient nor easy to use in orchestrating complex workflows – developers often have to manage complex function interactions by themselves, with customized implementation and unsatisfactory performance.In this paper, we argue that function orchestration should follow a data-centric approach. In our design, the platform provides a data bucket abstraction to hold the intermediate data generated by functions. Developers can use a rich set of data trigger primitives to control when and how the output of each function should be passed to the next functions in a workflow. By making data consumption explicit and allowing it to trigger functions and drive the workflow, complex function interactions can be easily and efficiently supported. We present Pheromone – a scalable, low-latency serverless platform following this data-centric design. Compared to well-established commercial and open-source platforms, Pheromone cuts the latencies of function interactions and data exchanges by orders of magnitude, scales to large workflows, and enables easy implementation of complex applications.

[NSDI '23] Doing More with Less: Orchestrating Serverless Applications without an Orchestrator

  dblp   doi

Abstract

Standalone orchestrators simplify the development of serverless applications by providing higher-level programming interfaces, coordinating function interactions and ensuring exactly-once execution. However, they limit application flexibility and are expensive to use. We show that these specialized orchestration services are unnecessary. Instead, application-level orchestration, deployed as a library, can support the same programming interfaces, complex interactions and execution guarantees, utilizing only basic serverless components that are already universally supported and billed at a fine-grained per-use basis. Furthermore, application-level orchestration affords applications more flexibility and reduces costs for both providers and users.To demonstrate this, we present Unum, an application-level serverless orchestration system. Unum introduces an intermediate representation that partitions higher-level application definitions at compile-time and provides orchestration as a runtime library that executes in-situ with user-defined FaaS functions. On unmodified serverless infrastructures, Unum functions coordinate and ensure correctness in a decentralized manner by leveraging strongly consistent data stores.Compared with AWS Step Functions, a state-of-the-art standalone orchestrator, our evaluation shows that Unum performs well, costs significantly less and grants applications greater flexibility to employ application-specific patterns and optimizations. For a representative set of applications, Unum runs as much as 2x faster and costs 9x cheaper.

[MICRO '23] Memento: Architectural Support for Ephemeral Memory Management in Serverless Environments

  dblp   doi

Abstract

Serverless computing is an increasingly attractive paradigm in the cloud due to its ease of use and fine-grained pay-for-what-you-use billing. However, serverless computing poses new challenges to system design due to its short-lived function execution model. Our detailed analysis reveals that memory management is responsible for a major amount of function execution cycles. This is because functions pay the full critical-path costs of memory management in both userspace and the operating system without the opportunity to amortize these costs over their short lifetimes.To address this problem, we propose Memento, a new hardware-centric memory management design based upon our insights that memory allocations in serverless functions are typically small, and either quickly freed after allocation or freed when the function exits. Memento alleviates the overheads of serverless memory management by introducing two key mechanisms: (i) a hardware object allocator that performs in-cache memory allocation and free operations based on arenas, and (ii) a hardware page allocator that manages a small pool of physical pages used to replenish arenas of the object allocator. Together these mechanisms alleviate memory management overheads and bypass costly userspace and kernel operations. Memento naturally integrates with existing software stacks through a set of ISA extensions that enable seamless integration with multiple languages runtimes. Finally, Memento leverages the newly exposed memory allocation semantics in hardware to introduce a main memory bypass mechanism and avoid unnecessary DRAM accesses for newly allocated objects.We evaluate Memento with full-system simulations across a diverse set of containerized serverless workloads and language runtimes. The results show that Memento achieves function execution speedups ranging between 8–28% and 16% on average. Furthermore, Memento hardware allocators and main memory bypass mechanisms drastically reduce main memory traffic by 30% on average. The combined effects of Memento reduce the pricing cost of function execution by 29%. Finally, we demonstrate the applicability of Memento beyond functions, to major serverless platform operations and long-running data processing applications.

[ISCA '23] MXFaaS: Resource Sharing in Serverless Environments for Parallelism and Efficiency

  dblp   doi

Abstract

Although serverless computing is a popular paradigm, current serverless environments have high overheads. Recently, it has been shown that serverless workloads frequently exhibit bursts of invocations of the same function. Such pattern is not handled well in current platforms. Supporting it efficiently can speed-up serverless execution substantially.In this paper, we target this dominant pattern with a new server-less platform design named MXFaaS. MXFaaS improves function performance by efficiently multiplexing (i.e., sharing) processor cycles, I/O bandwidth, and memory/processor state between concurrently executing invocations of the same function. MXFaaS introduces a new container abstraction called MXContainer. To enable efficient use of processor cycles, an MXContainer carefully helps schedule same-function invocations for minimal response time. To enable efficient use of I/O bandwidth, an MXContainer coalesces remote storage accesses and remote function calls from same-function invocations. Finally, to enable efficient use of memory/processor state, an MXContainer first initializes the state of its container and only later, on demand, spawns a process per function invocation, so that all invocations can share unmodified memory state and hence minimize memory footprint.We implement MXFaaS in two serverless platforms and run diverse serverless benchmarks. With MXFaaS, serverless environments are much more efficient. Compared to a state-of-the-art serverless environment, MXFaaS on average speeds-up execution by 5.2×, reduces P99 tail latency by 7.4×, and improves throughput by 4.8×. In addition, it reduces the average memory usage by 3.4×.

[INFOCOM '23] DisProTrack: Distributed Provenance Tracking over Serverless Applications

  dblp   doi

Abstract

Provenance tracking has been widely used in the recent literature to debug system vulnerabilities and find the root causes behind faults, errors, or crashes over a running system. However, the existing approaches primarily developed graph-based models for provenance tracking over monolithic applications running directly over the operating system kernel. In contrast, the modern DevOps-based service-oriented architecture relies on distributed platforms, like serverless computing that uses container-based sandboxing over the kernel. Provenance tracking over such a distributed micro-service architecture is challenging, as the application and system logs are generated asynchronously and follow heterogeneous nomenclature and logging formats. This paper develops a novel approach to combining system and micro-services logs together to generate a Universal Provenance Graph (UPG) that can be used for provenance tracking over serverless architecture. We develop a Loadable Kernel Module (LKM) for runtime unit identification over the logs by intercepting the system calls with the help from the control flow graphs over the static application binaries. Finally, we design a regular expression-based log optimization method for reverse query parsing over the generated UPG. A thorough evaluation of the proposed UPG model with different benchmarked serverless applications shows the system’s effectiveness.

[INFOCOM '23] Enabling Age-Aware Big Data Analytics in Serverless Edge Clouds

  dblp   doi

Abstract

With the fast development of artificial intelligence applications, large-volume big data generated in the edge of networks are waiting for real-time analysis, such that the valuable information is unveiled. Analytic developers for big data applications usually face the burden of managing the underlying cloud resources, which greatly drags the speed of analytic development. Serverless Computing is envisioned as an enabling technology to release the management burden of developers and to enable agile big data analytics. That is, big data analytics can be implemented in short-lived functions via the Function-as-a-Service (FaaS) programming paradigm. In this paper, we aim to fill the gap between serverless computing and mobile edge computing, via enabling query evaluations for big data analytics in short-lived functions of a serverless edge cloud (SEC). Specifically, we formulate novel age-aware big data query evaluation problems in an SEC so that the age of data is minimized, where the age of data is defined as the time difference between the current time and the generation time of the dataset. We propose approximation algorithms for the age-aware big data query evaluation problem with a single query, by proposing a novel parameterized virtualization technique that strives for a fine trade-off between short-lived functions and large resource demands of big data queries. We also devise an online learning algorithm with a bounded regret for the problem with multiple queries arriving dynamically and without prior knowledge of resource demands of the queries. We finally evaluate the performance of the proposed algorithms by extensive simulations. Simulation results show that the performance of our algorithms is promising.

[INFOCOM '23] On Efficient Zygote Container Planning toward Fast Function Startup in Serverless Edge Cloud

  dblp   doi

Abstract

The cold startup of the container is regarded as a crucial problem to the performance of serverless computing, especially to the resource-capacitated edge clouds. Pre-warming hot containers has been proved as an efficient solution but is at the expense of high memory consumption. Instead of pre-warming a complete container for a function, recent studies advocate Zygote container, which pre-imports some packages and is able to import the other dependent packages at runtime, so as to avoid the cold startup problem. However, as different functions have different package dependencies, how to plan the Zygote generation and pre-warming in a resource-capacitated edge cloud becomes a critical challenge. In this paper, aiming to minimize the overall function startup time and subjective to the resource capacity constraints, we formulate this problem into a Quadratic Integer Programming (QIP) form. We further propose a Randomized Rounding based Zygote Planning (RRZP) algorithm. The performance efficiency of our algorithm is proved via both theoretical analysis and trace-driven simulations. The results show that our algorithm can significantly reduce the startup time by 25.6%.

[INFOCOM '23] Online Container Scheduling for Data-intensive Applications in Serverless Edge Computing

  dblp   doi

Abstract

Introducing the emerging serverless paradigm into edge computing could avoid over- and under-provisioning of limited edge resources and make complex edge resource management transparent to application developers, which largely facilitates the cost-effectiveness, portability, and short time-to-market of edge applications. However, the computation/data dispersion and device/network heterogeneity of edge environments prevent current serverless computing platforms from acclimating to the network edge. In this paper, we address such challenges by formulating a container placement and data flow routing problem, which fully considers the heterogeneity of edge networks and the overhead of operating serverless platforms on resource-limited edge servers. We design an online algorithm to solve the problem. We further show its local optimum for each arriving container and prove its theoretical guarantee to the optimal offline solution. We also conduct extensive simulations based on practical experiment results to show the advantages of the proposed algorithm over existing baselines.

[INFOCOM '23] Time and Cost-Efficient Cloud Data Transmission based on Serverless Computing Compression

  dblp   doi

Abstract

Nowadays, there exists a lot of cross-region data transmission demand on cloud. It is promising to use serverless computing for compressing data to save the transmission data amount. However, it is challenging to estimate the data transmission time and monetary cost with serverless compression. In addition, minimizing the data transmission cost is non-trivial due to enormous parameter space and joint optimization. This paper focuses on this problem and makes the following contributions: (1) We propose empirical data transmission time and monetary cost models based on serverless compression. (2) For single-task cloud data transmission, we propose two efficient parameter search methods based on Sequential Quadratic Programming (SQP ) and Eliminate then Divide and Conquer (EDC), which are theoretically proven with error upper bounds. (3) Furthermore, for multi-task cloud data transmission, a parameter search method based on dynamic programming and numerical computation is proposed to reduce the algorithm complexity from exponential to linear complexity. We have implemented the entire actual system and evaluated it with various workloads and application cases on the real-world AWS serverless computing platform. Experimental results on cross-region public cloud show that the proposed approach can improve the parameter search efficiency by more than 3× compared with the state-of-art parameter search methods and achieves better parameter quality. Compared with other competing cloud data transmission approaches, our approach is able to achieve higher time efficiency and lower monetary cost.

[INFOCOM '23] EAVS: Edge-assisted Adaptive Video Streaming with Fine-grained Serverless Pipelines

  dblp   doi

Abstract

Recent years have witnessed video streaming gradually evolve into one of the most popular Internet applications. With the rapidly growing personalized demand for real-time video streaming services, maximizing their Quality of Experience (QoE) is a long-standing challenge. The emergence of the serverless computing paradigm has potential to meet this challenge through its fine-grained management and highly parallel computing structures. However, it is still ambiguous how to implement and configure serverless components to optimize video streaming services. In this paper, we propose EAVS, an Edge-assisted Adaptive Video streaming system with Serverless pipelines, which facilitates fine-grained management for multiple concurrent video transmission pipelines. Then, we design a chunk-level optimization scheme to address video bitrate adaptation. We propose a Deep Reinforcement Learning (DRL) algorithm based on Proximal Policy Optimization (PPO) with a trinal-clip mechanism to make bitrate decisions efficiently for better QoE. Finally, we implement the serverless video streaming system prototype and evaluate the performance of EAVS on various real-world network traces. Our results show that EAVS significantly improves QoE and reduces the video stall rate, achieving over 9.1% QoE improvement and 60.2% latency reduction compared to state-of-the-art solutions.

[HPCA '23] SpecFaaS: Accelerating Serverless Applications with Speculative Function Execution

  dblp   doi

Abstract

Serverless computing has emerged as a popular cloud computing paradigm. Serverless environments are convenient to users and efficient for cloud providers. However, they can induce substantial application execution overheads, especially in applications with many functions.In this paper, we propose to accelerate serverless applications with a novel approach based on software-supported speculative execution of functions. Our proposal is termed Speculative Function-as-a-Service (SpecFaaS). It is inspired by out-of-order execution in modern processors, and is grounded in a characterization analysis of FaaS applications. In SpecFaaS, functions in an application are executed early, speculatively, before their control and data dependences are resolved. Control dependences are predicted like in pipeline branch prediction, and data dependences are speculatively satisfied with memoization. With this support, the execution of downstream functions is overlapped with that of upstream functions, substantially reducing the end-to-end execution time of applications. We prototype SpecFaaS on Apache OpenWhisk, an open-source serverless computing platform. For a set of applications in a warmed-up environment, SpecFaaS attains an average speedup of 4.6×. Further, on average, the application throughput increases by 3.9× and the tail latency decreases by 58.7%.

[EUROSYS '23] Palette Load Balancing: Locality Hints for Serverless Functions

  dblp   doi

Abstract

Function-as-a-Service (FaaS) serverless computing enables a simple programming model with almost unbounded elasticity. Unfortunately, current FaaS platforms achieve this flexibility at the cost of lower performance for data-intensive applications compared to a serverful deployment. The ability to have computation close to data is a key missing feature. We introduce Palette load balancing, which offers FaaS applications a simple mechanism to express locality to the platform, through hints we term "colors". Palette maintains the serverless nature of the service - users are still not allocating resources - while allowing the platform to place successive invocations related to each other on the same executing node. We compare a prototype of the Palette load balancer to a state-of-the-art locality-oblivious load balancer on representative examples of three applications. For a serverless web application with a local cache, Palette improves the hit ratio by 6x. For a serverless version of Dask, Palette improves run times by 46% and 40% on Task Bench and TPC-H, respectively. On a serverless version of NumS, Palette improves run times by 37%. These improvements largely bridge the gap to serverful implementation of the same systems.

[EUROSYS '23] With Great Freedom Comes Great Opportunity: Rethinking Resource Allocation for Serverless Functions

  dblp   doi

Abstract

Current serverless offerings give users limited flexibility for configuring the resources allocated to their function invocations. This simplifies the interface for users to deploy server-less computations but creates deployments that are resource inefficient. In this paper, we take a principled approach to the problem of resource allocation for serverless functions, analyzing the effects of automating this choice in a way that leads to the best combination of performance and cost. In particular, we systematically explore the opportunities that come with decoupling memory and CPU resource allocations and also enabling the use of different VM types, and we find a rich trade-off space between performance and cost. The provider can use this in a number of ways, e.g., exposing all these parameters to the user; eliding preferences for performance and cost from users and simply offer the same performance with lower cost; or exposing a small number of choices for users to trade performance for cost.Our results show that, by decoupling memory and CPU allocation, there is the potential to have up to 40% lower execution cost than the preset coupled configurations that are the norm in current serverless offerings. Similarly, making the correct choice of VM instance type can provide up to 50% better execution time. Furthermore, we demonstrate that providers have the flexibility to choose different instance types for the same functions to maximize resource utilization while providing performance within 10--20% of the best resource configuration for each respective function.

[ASPLOS '23] ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning

  dblp   doi

Abstract

This paper proposes ElasticFlow, an elastic serverless training platform for distributed deep learning. ElasticFlow provides a serverless interface with two distinct features: (i) users specify only the deep neural network (DNN) model and hyperparameters for a job, but not the number of GPUs; (ii) users specify the deadline for a job, but not the amount of time to occupy GPUs. In contrast to existing server-centric platforms, ElasticFlow provides performance guarantees in terms of meeting deadlines while alleviating tedious, low-level, and manual resource management for deep learning developers. The characteristics of distributed training introduce two challenges. First, the training throughput scales non-linearly with the number of GPUs. Second, the scaling efficiency is affected by worker placement. To address these challenges, we propose Minimum Satisfactory Share to capture the resource usage of training jobs to meet deadlines, and ElasticFlow performs admission control based on it. We develop a greedy algorithm that dynamically allocates resources to admitted jobs based on diminishing returns. We apply buddy allocation to worker placement to eliminate the effect of topology. Evaluation results on a cluster of 128 GPUs show that ElasticFlow increases the number of jobs that can meet their deadlines by 1.46–7.65× compared to existing solutions.

[ASPLOS '23] DataFlower: Exploiting the Data-flow Paradigm for Serverless Workflow Orchestration

  dblp   doi

Abstract

Serverless computing that runs functions with auto-scaling is a popular task execution pattern in the cloud-native era. By connecting serverless functions into workflows, tenants can achieve complex functionality. Prior research adopts the control-flow paradigm to orchestrate a serverless workflow. However, the control-flow paradigm inherently results in long response latency, due to the heavy data persistence overhead, sequential resource usage, and late function triggering.Our investigation shows that the data-flow paradigm has the potential to resolve the above problems, with careful design and optimization. We propose DataFlower, a scheme that achieves the data-flow paradigm for serverless workflows. In DataFlower, a container is abstracted to be a function logic unit and a data logic unit. The function logic unit runs the functions, and the data logic unit handles the data transmission asynchronously. Moreover, a host-container collaborative communication mechanism is used to support efficient data transfer. Our experimental results show that compared to state-of-the-art serverless designs, DataFlower reduces the 99%-ile latency of the benchmarks by up to 35.4%, and improves the peak throughput by up to 3.8X.

[ASPLOS '23] Flame: A Centralized Cache Controller for Serverless Computing

  dblp   doi

Abstract

Caching function is a promising way to mitigate coldstart overhead in serverless computing. However, as caching also increases the resource cost significantly, how to make caching decisions is still challenging. We find that the prior "local cache control" designs are insufficient to achieve high cache efficiency due to the workload skewness across servers.In this paper, inspired by the idea of software defined network management, we propose Flame, an efficient cache system to manage cached functions with a "centralized cache control" design. By decoupling the cache control plane from local servers and setting up a separate centralized controller, Flame is able to make caching decisions considering a global view of cluster status, enabling the optimized cache-hit ratio and resource efficiency. We evaluate Flame with real-world workloads and the evaluation results show that it can reduce the cache resource usage by 36% on average while improving the coldstart ratio by nearly 7x than the state-of-the-art method.

[ASPLOS '23] λFS: A Scalable and Elastic Distributed File System Metadata Service using Serverless Functions

  dblp   doi

Abstract

The metadata service (MDS) sits on the critical path for distributed file system (DFS) operations, and therefore it is key to the overall performance of a large-scale DFS. Common "serverful" MDS architectures, such as a single server or cluster of servers, have a significant shortcoming: either they are not scalable, or they make it difficult to achieve an optimal balance of performance, resource utilization, and cost. A modern MDS requires a novel architecture that addresses this shortcoming.To this end, we design and implement γFS, an elastic, high-performance metadata service for large-scale DFSes. γFS scales a DFS metadata cache elastically on a FaaS (Function-as-a-Service) platform and synthesizes a series of techniques to overcome the obstacles that are encountered when building large, stateful, and performance-sensitive applications on FaaS platforms. γFS takes full advantage of the unique benefits offered by FaaS---elastic scaling and massive parallelism---to realize a highly-optimized metadata service capable of sustaining up to 4.13X higher throughput, 90.40% lower latency, 85.99% lower cost, 3.33X better performance-per-cost, and better resource utilization and efficiency than a state-of-the-art DFS for an industrial workload.

[USENIX '22] RunD: A Lightweight Secure Container Runtime for High-density Deployment and High-concurrency Startup in Serverless Computing

  dblp   doi

Abstract

The secure container that hosts a single container in a micro virtual machine (VM) is now used in serverless computing, as the containers are isolated through the microVMs. There are high demands on the high-density container deployment and high-concurrency container startup to improve both the resource utilization and user experience, as user functions are fine-grained in serverless platforms. Our investigation shows that the entire software stacks, containing the cgroups in the host operating system, the guest operating system, and the container rootfs for the function workload, together result in low deployment density and slow startup performance at high-concurrency. We therefore propose and implement a lightweight secure container runtime, named RunD, to resolve the above problems through a holistic guest-to-host solution. With RunD, over 200 secure containers can be started in a second, and over 2500 secure containers can be deployed on a node with 384GB of memory. RunD is adopted as Alibaba serverless container runtime to support high-density deployment and high-concurrency startup.

[USENIX '22] Help Rather Than Recycle: Alleviating Cold Startup in Serverless Computing Through Inter-Function Container Sharing

  dblp   doi

Abstract

In serverless computing, each function invocation is executed in a container (or a Virtual Machine), and container cold startup results in long response latency. We observe that some functions suffer from cold container startup, while the warm containers of other functions are idle. Based on the observation, other than booting a new container for a function from scratch, we propose to alleviate the cold startup by re-purposing a warm but idle container from another function. We implement a container management scheme, named Pagurus, to achieve the purpose. Pagurus comprises an intra-function manager for replacing an idle warm container to be a container that other functions can use without introducing additional security issues, an inter-function scheduler for scheduling containers between functions, and a sharing-aware function balancer at the cluster-level for balancing the workload across different nodes. Experiments using Azure serverless traces show that Pagurus alleviates 84.6\% of the cold startup, and the cold startup latency is reduced from hundreds of milliseconds to 16 milliseconds if alleviated.

[USENIX '22] Tetris: Memory-efficient Serverless Inference through Tensor Sharing

  dblp   doi

Abstract

Executing complex, memory-intensive deep learning inference services poses a major challenge for serverless computing frameworks, which would densely deploy and maintain inference models at high throughput. We observe the excessive memory consumption problem in serverless inference systems, due to the large-sized models and high data redundancy. We present Tetris, a serverless platform catered to inference services with an order of magnitude lower memory footprint. Tetris’s design carefully considers the extensive memory sharing of runtime and tensors. It supports minimizing the runtime redundancy through a combined optimization of batching and concurrent execution and eliminates tensor redundancy across instances from either the same or different functions using a lightweight and safe tensor mapping mechanism. Our comprehensive evaluation demonstrates that Tetris saves up to 93% memory footprint for inference services, and increases the function density by 30× without impairing the latency.

[SIGCOMM '22] SPRIGHT: extracting the server from serverless computing! high-performance eBPF-based event-driven, shared-memory processing

  dblp   doi

Abstract

Serverless computing promises an efficient, low-cost compute capability in cloud environments. However, existing solutions, epitomized by open-source platforms such as Knative, include heavyweight components that undermine this goal of serverless computing. Additionally, such serverless platforms lack dataplane optimizations to achieve efficient, high-performance function chains that facilitate the popular microservices development paradigm. Their use of unnecessarily complex and duplicate capabilities for building function chains severely degrades performance. 'Cold-start' latency is another deterrent.We describe SPRIGHT, a lightweight, high-performance, responsive serverless framework. SPRIGHT exploits shared memory processing and dramatically improves the scalability of the dataplane by avoiding unnecessary protocol processing and serialization-deserialization overheads. SPRIGHT extensively leverages event-driven processing with the extended Berkeley Packet Filter (eBPF). We creatively use eBPF's socket message mechanism to support shared memory processing, with overheads being strictly load-proportional. Compared to constantly-running, polling-based DPDK, SPRIGHT achieves the same dataplane performance with 10× less CPU usage under realistic workloads. Additionally, eBPF benefits SPRIGHT, by replacing heavyweight serverless components, allowing us to keep functions 'warm' with negligible penalty.Our preliminary experimental results show that SPRIGHT achieves an order of magnitude improvement in throughput and latency compared to Knative, while substantially reducing CPU usage, and obviates the need for 'cold-start'.

[SC '22] DayDream: Executing Dynamic Scientific Workflows on Serverless Platforms with Hot Starts

  dblp   doi

Abstract

HPC applications are increasingly being designed as dynamic workflows for the ease of development and scaling. This work demonstrates how the serverless computing model can be leveraged for efficient execution of complex, real-world scientific workflows, although serverless computing was not originally designed for executing scientific workflows. This work characterizes, quantifies, and improves the execution of three real-world, complex, dynamic scientific workflows: ExaFEL (workflow for investigating the molecular structures via X-Ray diffraction), Cosmoscout-Vr(workflow for large scale virtual reality simulation), and Core Cosmology Library (a cosmology workflow for investigating dark matter). The proposed technique, DayDream, employs the hot start mechanism for warming up the components of the workflows by decoupling the runtime environment from the component function code to mitigate cold start overhead. DayDream optimizes the service time and service cost jointly to reduce the service time by 45% and service cost by 23% over the state-of-the-art HPC workload manager.

[SC '22] SFS: Smart OS Scheduling for Serverless Functions

  dblp   doi

Abstract

Serverless computing enables a new way of building and scaling cloud applications by allowing developers to write fine-grained serverless or cloud functions. The execution duration of a cloud function is typically short-ranging from a few milliseconds to hundreds of seconds. However, due to resource contentions caused by public clouds' deep consolidation, the function execution duration may get significantly prolonged and fail to accurately account for the function's true resource usage. We observe that the function duration can be highly unpredictable with huge amplification of more than 50× for an open-source FaaS platform (OpenLambda). Our experiments show that the OS scheduling policy of cloud functions' host server can have a crucial impact on performance. The default Linux scheduler, CFS (Completely Fair Scheduler), being oblivious to workloads, frequently context-switches short functions, causing a turnaround time that is much longer than their service time. We propose SFS (Smart Function Scheduler), which works entirely in the user space and carefully orchestrates existing Linux FIFO and CFS schedulers to approximate Shortest Remaining Time First (SRTF). SFS uses two-level scheduling that seamlessly combines a new FILTER policy with Linux CFS, to trade off increased duration of long functions for significant performance improvement for short functions. We implement SFS in the Linux user space and port it to OpenLambda. Evaluation results show that SFS significantly improves short functions' duration with a small impact on relatively longer functions, compared to CFS.

[PPOPP '22] Mashup: making serverless computing useful for HPC workflows via hybrid execution

  dblp   doi

Abstract

This work introduces Mashup, a novel strategy to leverage serverless computing model for executing scientific workflows in a hybrid fashion by taking advantage of both the traditional VM-based cloud computing platform and the emerging serverless platform. Mashup outperforms the state-of-the-art workflow execution engines by an average of 34% and 43% in terms of execution time reduction and cost reduction, respectively, for widely-used HPC workflows on the Amazon Cloud platform (EC2 and Lambda).

[OSDI '22] ORION and the Three Rights: Sizing, Bundling, and Prewarming for Serverless DAGs

  dblp   doi

Abstract

Serverless applications represented as DAGs have been growing in popularity. For many of these applications, it would be useful to estimate the end-to-end (E2E) latency and to allocate resources to individual functions so as to meet probabilistic guarantees for the E2E latency. This goal has not been met till now due to three fundamental challenges. The first is the high variability and correlation in the execution time of individual functions, the second is the skew in execution times of the parallel invocations, and the third is the incidence of cold starts. In this paper, we introduce ORION to achieve these goals. We first analyze traces from a production FaaS infrastructure to identify three characteristics of serverless DAGs. We use these to motivate and design three features. The first is a performance model that accounts for runtime variabilities and dependencies among functions in a DAG. The second is a method for co-locating multiple parallel invocations within a single VM thus mitigating content-based skew among these invocations. The third is a method for pre-warming VMs for subsequent functions in a DAG with the right look-ahead time. We integrate these three innovations and evaluate ORION on AWS Lambda with three serverless DAG applications. Our evaluation shows that compared to three competing approaches, ORION achieves up to 90% lower P95 latency without increasing $ cost, or up to 53% lower $ cost without increasing tail latency.

[ISCA '22] Lukewarm serverless functions: characterization and optimization

  dblp   doi

Abstract

Serverless computing has emerged as a widely-used paradigm for running services in the cloud. In serverless, developers organize their applications as a set of functions, which are invoked on-demand in response to events, such as an HTTP request. To avoid long start-up delays of launching a new function instance, cloud providers tend to keep recently-triggered instances idle (or warm) for some time after the most recent invocation in anticipation of future invocations. Thus, at any given moment on a server, there may be thousands of warm instances of various functions whose executions are interleaved in time based on incoming invocations.This paper observes that (1) there is a high degree of interleaving among warm instances on a given server; (2) the individual warm functions are invoked relatively infrequently, often at the granularity of seconds or minutes; and (3) many function invocations complete within a few milliseconds. Interleaved execution of rarely invoked functions on a server leads to thrashing of each function's microarchitectural state between invocations. Meanwhile, the short execution time of a function impedes amortization of the warm-up latency of the cache hierarchy, causing a 31--114% increase in CPI compared to execution with warm microarchitectural state. We identify on-chip misses for instructions as a major contributor to the performance loss. In response we propose Jukebox, a record-and-replay instruction prefetcher specifically designed for reducing the start-up latency of warm function instances. Jukebox requires just 32KB of metadata per function instance and boosts performance by an average of 18.7% for a wide range of functions, which translates into a corresponding throughput improvement.

[ISCA '22] HiveMind: a hardware-software system stack for serverless edge swarms

  dblp   doi

Abstract

Swarms of autonomous devices are increasing in ubiquity and size, making the need for rethinking their hardware-software system stack critical.We present HiveMind, the first swarm coordination platform that enables programmable execution of complex task workflows between cloud and edge resources in a performant and scalable manner. HiveMind is a software-hardware platform that includes a domain-specific language to simplify programmability of cloud-edge applications, a program synthesis tool to automatically explore task placement strategies, a centralized controller that leverages serverless computing to elastically scale cloud resources, and a reconfigurable hardware acceleration fabric for network and remote memory accesses.We design and build the full end-to-end HiveMind system on two real edge swarms comprised of drones and robotic cars. We quantify the opportunities and challenges serverless introduces to edge applications, as well as the trade-offs between centralized and distributed coordination. We show that HiveMind achieves significantly better performance predictability and battery efficiency compared to existing centralized and decentralized platforms, while also incurring lower network traffic. Using both real systems and a validated simulator we show that HiveMind can scale to thousands of edge devices without sacrificing performance or efficiency, demonstrating that centralized platforms can be both scalable and performant.

[INFOCOM '22] Retention-Aware Container Caching for Serverless Edge Computing

  dblp   doi

Abstract

Serverless edge computing adopts an event-based model where Internet-of-Things (IoT) services are executed in lightweight containers only when requested, leading to significantly improved edge resource utilization. Unfortunately, the startup latency of containers degrades the responsiveness of IoT services dramatically. Container caching, while masking this latency, requires retaining resources thus compromising resource efficiency. In this paper, we study the retention-aware container caching problem in serverless edge computing. We leverage the distributed and heterogeneous nature of edge platforms and propose to optimize container caching jointly with request distribution. We reveal step by step that this joint optimization problem can be mapped to the classic ski-rental problem. We first present an online competitive algorithm for a special case where request distribution and container caching are based on a set of carefully designed probability distribution functions. Based on this algorithm, we propose an online algorithm called O-RDC for the general case, which incorporates the resource capacity and network latency by opportunistically distributing requests. We conduct extensive experiments to examine the performance of the proposed algorithms with both synthetic and real-world serverless computing traces. Our results show that ORDC outperforms existing caching strategies of current serverless computing platforms by up to 94.5% in terms of the overall system cost.

[INFOCOM '22] StepConf: SLO-Aware Dynamic Resource Configuration for Serverless Function Workflows

  dblp   doi

Abstract

Function-as-a-Service (FaaS) offers a fine-grained resource provision model, enabling developers to build highly elastic cloud applications. User requests are handled by a series of serverless functions step by step, which forms a function-based workflow. The developers are required to set proper resource configuration for functions, so as to meet service level objectives (SLOs) and save cost. However, developing the resource configuration strategy is challenging. It is mainly because execution of cloud functions often suffers from cold start and performance fluctuation, which requires a dynamic configuration strategy to guarantee the SLOs. In this paper, we present StepConf, a framework that automates the resource configuration for functions as the workflow runs. StepConf optimizes memory size for each function step in the workflow and takes inter and intra-function parallelism into consideration. We evaluate StepConf on AWS Lambda. Compared with baselines, the experimental results show that StepConf can save cost up to 40.9% while ensuring the SLOs.

[EUROSYS '22] Fireworks: a fast, efficient, and safe serverless framework using VM-level post-JIT snapshot

  dblp   doi

Abstract

Serverless computing is a new paradigm that is rapidly gaining popularity in Cloud computing. One unique property in serverless computing is that the unit of deployment and execution is a serverless function, which is much smaller than a typical server program. Serverless computing introduces a new pay-as-you-go billing model and provides a high economic benefit from highly elastic resource provisioning. However, serverless computing also brings new challenges such as (1) long start-up times compared to relatively short function execution times, (2) security risks from a highly consolidated environment, and (3) memory efficiency problems from unpredictable function invocations. These problems not only degrade performance but also lower the economic benefits of Cloud providers.To address these challenges without any compromises, we propose a novel VM-level post-JIT snapshot approach and develop a new serverless framework, Fireworks. Our key idea is to synergistically leverage a virtual machine (VM)-level snapshot with a language runtime-level just-in-time (JIT) compilation in tandem. Fireworks leverages JITted serverless function code to reduce both start-up time and execution time of functions and improves memory efficiency by sharing the JITted code. Also, Fireworks can provide a high level of isolation by using a VM as a sandbox to execute a serverless function. Our evaluation results show that Fireworks outperforms state-of-art serverless platforms by 20.6× and provides higher memory efficiency of up to 7.3×.

[EUROSYS '22] Jiffy: elastic far-memory for stateful serverless analytics

  dblp   doi

Abstract

Stateful serverless analytics can be enabled using a remote memory system for inter-task communication, and for storing and exchanging intermediate data. However, existing systems allocate memory resources at job granularity---jobs specify their memory demands at the time of the submission; and, the system allocates memory equal to the job's demand for the entirety of its lifetime. This leads to resource underutilization and/or performance degradation when intermediate data sizes vary during job execution.This paper presents Jiffy, an elastic far-memory system for stateful serverless analytics that meets the instantaneous memory demand of a job at seconds timescales. Jiffy efficiently multiplexes memory capacity across concurrently running jobs, reducing the overheads of reads and writes to slower persistent storage, resulting in 1.6 -- 2.5× improvements in job execution time over production workloads. Jiffy implementation currently runs on Amazon EC2, enables a wide variety of distributed programming models including MapReduce, Dryad, StreamScope, and Piccolo, and natively supports a large class of analytics applications on AWS Lambda.

[EUROSYS '22] Memory deduplication for serverless computing with Medes

  dblp   doi

Abstract

Serverless platforms today impose rigid trade-offs between resource use and user-perceived performance. Limited controls, provided via toggling sandboxes between warm and cold states and keep-alives, force operators to sacrifice significant resources to achieve good performance. We present a serverless framework, Medes, that breaks the rigid trade-off and allows operators to navigate the trade-off space smoothly. Medes leverages the fact that the warm sandboxes running on serverless platforms have a high fraction of duplication in their memory footprints. We exploit these redundant chunks to develop a new sandbox state, called a dedup state, that is more memory-efficient than the warm state and faster to restore from than the cold state. We develop novel mechanisms to identify memory redundancy at minimal overhead while ensuring that the dedup containers' memory footprint is small. Finally, we develop a simple sandbox management policy that exposes a narrow, intuitive interface for operators to trade-off performance for memory by jointly controlling warm and dedup sandboxes. Detailed experiments with a prototype using real-world serverless workloads demonstrate that Medes can provide up to 1×-2.75× improvements in the end-to-end latencies. The benefits of Medes are enhanced in memory pressure situations, where Medes can provide up to 3.8× improvements in end-to-end latencies. Medes achieves this by reducing the number of cold starts incurred by 10--50% against the state-of-the-art baselines.

[ASPLOS '22] AQUATOPE: QoS-and-Uncertainty-Aware Resource Management for Multi-stage Serverless Workflows

  dblp   doi

Abstract

Multi-stage serverless applications, i.e., workflows with many computation and I/O stages, are becoming increasingly representative of FaaS platforms. Despite their advantages in terms of fine-grained scalability and modular development, these applications are subject to suboptimal performance, resource inefficiency, and high costs to a larger degree than previous simple serverless functions.We present Aquatope, a QoS-and-uncertainty-aware resource scheduler for end-to-end serverless workflows that takes into account the inherent uncertainty present in FaaS platforms, and improves performance predictability and resource efficiency. Aquatope uses a set of scalable and validated Bayesian models to create pre-warmed containers ahead of function invocations, and to allocate appropriate resources at function granularity to meet a complex workflow's end-to-end QoS, while minimizing resource cost. Across a diverse set of analytics and interactive multi-stage serverless workloads, Aquatope significantly outperforms prior systems, reducing QoS violations by 5X, and cost by 34% on average and up to 52% compared to other QoS-meeting methods.

[ASPLOS '22] IceBreaker: warming serverless functions better with heterogeneity

  dblp   doi

Abstract

Serverless computing, an emerging computing model, relies on "warming up" functions prior to its anticipated execution for faster and cost-effective service to users. Unfortunately, warming up functions can be inaccurate and incur prohibitively expensive cost during the warmup period (i.e., keep-alive cost). In this paper, we introduce IceBreaker, a novel technique that reduces the service time and the "keep-alive" cost by composing a system with heterogeneous nodes (costly and cheaper). IceBreaker does so by dynamically determining the cost-effective node type to warm up a function based on the function's time-varying probability of the next invocation. By employing heterogeneity, IceBreaker allows for more number of nodes under the same cost budget and hence, keeps more number of functions warm and reduces the wait time during high load. Our real-system evaluation confirms that IceBreaker reduces the overall keep-alive cost by 45% and execution time by 27% using representative serverless applications and industry-grade workload trace. IceBreaker is the first technique to employ and leverage the idea of mixing expensive and cheaper nodes to improve both service time and keep-alive cost for serverless functions -- opening up a new research avenue of serverless computing on heterogeneous servers for researchers and practitioners.

[ASPLOS '22] INFless: a native serverless system for low-latency, high-throughput inference

  dblp   doi

Abstract

A corrigendum was issued for this paper on March 15, 2022. You can download the corrigendum from the supplemental material section of this citation page.Modern websites increasingly rely on machine learning (ML) to improve their business efficiency. Developing and maintaining ML services incurs high costs for developers. Although serverless systems are a promising solution to reduce costs, we find that the current general purpose serverless systems cannot meet the low latency, high throughput demands of ML services.While simply ”patching” general serverless systems does not resolve the problem completely, we propose that such a system should natively combine the features of inference with a serverless paradigm. We present INFless, the first ML domain-specific serverless platform. It provides a unified, heterogeneous resource abstraction between CPU and accelerators, and achieves high throughput using built-in batching and non-uniform scaling mechanisms. It also supports low latency through coordinated management of batch queuing time, execution time and coldstart rate. We evaluate INFless using both a local cluster testbed and a large-scale simulation. Experimental results show that INFless outperforms state-of-the-art systems by 2×-5× on system throughput, meeting the latency goals of ML services.

[ASPLOS '22] Serverless computing on heterogeneous computers

  dblp   doi

Abstract

Existing serverless computing platforms are built upon homogeneous computers, limiting the function density and restricting serverless computing to limited scenarios. We introduce Molecule, the first serverless computing system utilizing heterogeneous computers. Molecule enables both general-purpose devices (e.g., Nvidia DPU) and domain-specific accelerators (e.g., FPGA and GPU) for serverless applications that significantly improve function density (50% higher) and application performance (up to 34.6x). To achieve these results, we first propose XPU-Shim, a distributed shim to bridge the gap between underlying multi-OS systems (when using general-purpose devices) and our serverless runtime (i.e., Molecule). We further introduce vectorized sandbox, a sandbox abstraction to abstract hardware heterogeneity (when using domain-specific accelerators). Moreover, we also review state-of-the-art serverless optimizations on startup and communication latency and overcome the challenges to implement them on heterogeneous computers. We have implemented Molecule on real platforms with Nvidia DPUs and Xilinx FPGAs and evaluate it using benchmarks and real-world applications.

[USENIX '21] SONIC: Application-aware Data Passing for Chained Serverless Applications

  dblp   doi

Abstract

Data analytics applications are increasingly leveraging serverless execution environments for their ease-of-use and pay-as-you-go billing. The structure of such applications is usually composed of multiple functions that are chained together to form a workflow. The current approach of ex-changing intermediate (ephemeral) data between functions is through a remote storage (such as S3), which introduces significant performance overhead. We compare three data-passing methods, which we call VM-Storage, Direct-Passing, and state-of-practice Remote-Storage. Crucially, we show that no single data-passing method prevails under all scenarios and the optimal choice depends on dynamic factors such as the size of input data, the size of intermediate data, the application's degree of parallelism, and network bandwidth. We propose SONIC, a data-passing manager that optimizes application performance and cost, by transparently selecting the optimal data-passing method for each edge of a serverless workflow DAG and implementing communication-aware function placement. SONIC monitors application parameters and uses simple regression models to adapt its hybrid data passing accordingly. We integrate SONIC with Open-Lambda and evaluate the system on Amazon EC2 with three analytics applications, popular in the serverless environment. SONIC provides lower latency (raw performance) and higher performance/$ across diverse conditions, compared to four baselines: SAND, vanilla OpenLambda, OpenLambda with Pocket, and AWS Lambda.

[USENIX '21] FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute

  dblp   doi

Abstract

Serverless computing, or Function-as-a-Service (FaaS), enables a new way of building and scaling applications by allowing users to deploy fine-grained functions while providing fully-managed resource provisioning and auto-scaling. Custom FaaS container support is gaining traction as it enables better control over OSes, versioning, and tooling for modernizing FaaS applications. However, providing rapid container provisioning introduces non-trivial challenges for FaaS providers, since container provisioning is costly, and real-world FaaS workloads exhibit highly dynamic patterns. In this paper, we design FaaSNet, a highly-scalable middleware system for accelerating FaaS container provisioning. FaaSNet is driven by the workload and infrastructure requirements of the FaaS platform at one of the world's largest cloud providers, Alibaba Cloud Function Compute.FaaSNet enables scalable container provisioning via a lightweight, adaptive function tree (FT) structure. FaaSNet uses an I/O efficient, on-demand fetching mechanism to further reduce provisioning costs at scale. We implement and integrate FaaSNet in Alibaba Cloud Function Compute. Evaluation results show that FaaSNet: (1) finishes provisioning 2,500 function containers on 1,000 virtual machines in 8.3 seconds, (2) scales 13.4× and 16.3× faster than Alibaba Cloud's current FaaS platform and a state-of-the-art P2P container registry (Kraken), respectively, and (3) sustains a bursty workload using 75.2% less time than an optimized baseline.

[SOSP '21] Boki: Stateful Serverless Computing with Shared Logs

  dblp   doi

Abstract

Boki is a new serverless runtime that exports a shared log API to serverless functions. Boki shared logs enable stateful serverless applications to manage their state with durability, consistency, and fault tolerance. Boki shared logs achieve high throughput and low latency. The key enabler is the metalog, a novel mechanism that allows Boki to address ordering, consistency and fault tolerance independently. The metalog orders shared log records with high throughput and it provides read consistency while allowing service providers to optimize the write and read path of the shared log in different ways. To demonstrate the value of shared logs for stateful serverless applications, we build Boki support libraries that implement fault-tolerant workflows, durable object storage, and message queues. Our evaluation shows that shared logs can speed up important serverless workloads by up to 4.7x.

[SOSP '21] Faster and Cheaper Serverless Computing on Harvested Resources

  dblp   doi

Abstract

Serverless computing is becoming increasingly popular due to its ease of programming, fast elasticity, and fine-grained billing. However, the serverless provider still needs to provision, manage, and pay the IaaS provider for the virtual machines (VMs) hosting its platform. This ties the cost of the serverless platform to the cost of the underlying VMs. One way to significantly reduce cost is to use spare resources, which cloud providers rent at a massive discount. Harvest VMs offer such cheap resources: they grow and shrink to harvest all the unallocated CPU cores in their host servers, but may be evicted to make room for more expensive VMs. Thus, using Harvest VMs to run the serverless platform comes with two main challenges that must be carefully managed: VM evictions and dynamically varying resources in each VM.In this work, we explore the challenges and benefits of hosting serverless (Function as a Service or simply FaaS) platforms on Harvest VMs. We characterize the serverless workloads and Harvest VMs of Microsoft Azure, and design a serverless load balancer that is aware of evictions and resource variations in Harvest VMs. We modify OpenWhisk, a widely-used open-source serverless platform, to monitor harvested resources and balance the load accordingly, and evaluate it experimentally. Our results show that adopting harvested resources improves efficiency and reduces cost. Under the same cost budget, running serverless platforms on harvested resources achieves 2.2x to 9.0x higher throughput compared to using dedicated resources. When using the same amount of resources, running serverless platforms on harvested resources achieves 48% to 89% cost savings with lower latency due to better load balancing.

[SC '21] Understanding, predicting and scheduling serverless workloads under partial interference

  dblp   doi

Abstract

Interference among distributed cloud applications can be classified into three types: full, partial and zero. While prior research merely focused on full interference, the partial interference that occurs at parts of applications is far more common yet still lacks in-depth study. Serverless computing that structures applications into small-sized, short-lived functions further exacerbate partial interference. We characterize the features of partial interference in serverless as exhibiting high volatility, spatial-temporal variation, and propagation. Given these observations, we propose an incremental learning predictor, named Gsight, which can achieve high precision by harnessing the spatial-temporal overlap codes and profiles of functions via an end-to-end call path. Experimental results show that Gsight can achieve an average error of 1.71%. Its convergence speed is at least 3X faster than that in a serverful system. A scheduling case study shows that the proposed method can improve function density by ≥ 18.79% while guaranteeing the quality of service (QoS).

[OSDI '21] Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads

  dblp   doi

Abstract

A graph neural network (GNN) enables deep learning on structured graph data. There are two major GNN training obstacles: 1) it relies on high-end servers with many GPUs which are expensive to purchase and maintain, and 2) limited memory on GPUs cannot scale to today's billion-edge graphs. This paper presents Dorylus: a distributed system for training GNNs. Uniquely, Dorylus can take advantage of serverless computing to increase scalability at a low cost. The key insight guiding our design is computation separation. Computation separation makes it possible to construct a deep, bounded-asynchronous pipeline where graph and tensor parallel tasks can fully overlap, effectively hiding the network latency incurred by Lambdas. With the help of thousands of Lambda threads, Dorylus scales GNN training to billion-edge graphs. Currently, for large graphs, CPU servers offer the best performance-per-dollar over GPU servers. Just using Lambdas on top of CPU servers offers up to 2.75✕ more performance-per-dollar than training only with CPU servers. Concretely, Dorylus is 1.22✕ faster and 4.83✕ cheaper than GPU servers for massive sparse graphs. Dorylus is up to 3.8✕ faster and 10.7✕ cheaper compared to existing sampling-based systems.

[NSDI '21] Caerus: NIMBLE Task Scheduling for Serverless Analytics

  dblp   doi

Abstract

Serverless platforms facilitate transparent resource elasticity and fine-grained billing, making them an attractive choice for data analytics. We find that while server-centric analytics frameworks typically optimize for job completion time (JCT), resource utilization and isolation via inter-job scheduling policies, serverless analytics requires optimizing for JCT and cost of execution instead, introducing a new scheduling problem. We present Caerus, a task scheduler for serverless analytics frameworks that employs a fine-grained NIMBLE scheduling algorithm to solve this problem. NIMBLE efficiently pipelines task executions within a job, minimizing execution cost while being Pareto-optimal between cost and JCT for arbitrary analytics jobs. To this end, NIMBLE models a wide range of execution parameters --- pipelineable and non-piplineable data dependencies, data generation, consumption and processing rates, etc. --- to determine the ideal task launch times. Our evaluation results show that in practice, Caerus is able to achieve both optimal cost and JCT for queries across a wide range of analytics workloads.

[ISCA '21] Confidential Serverless Made Efficient with Plug-In Enclaves

  dblp   doi

Abstract

Serverless computing has become a fact of life on modern clouds. A serverless function may process sensitive data from clients. Protecting such a function against untrusted clouds using hardware enclave is attractive for user privacy. In this work, we run existing serverless applications in SGX enclave, and observe that the performance degradation can be as high as 5.6× to even 422.6×. Our investigation identifies these slowdowns are related to architectural features, mainly from page-wise enclave initialization. Leveraging insights from our overhead analysis, we revisit SGX hardware design and make minimal modification to its enclave model. We extend SGX with a new primitive—region-wise plugin enclaves that can be mapped into existing enclaves to reuse attested common states amongst functions. By remapping plugin enclaves, an enclave allows in-situ processing to avoid expensive data movement in a function chain. Experiments show that our design reduces the enclave function latency by 94.74-99.57%, and boosts the autoscaling throughput by 19-179×.

[ASPLOS '21] Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices

  dblp   doi

Abstract

The microservice architecture is a popular software engineering approach for building flexible, large-scale online services. Serverless functions, or function as a service (FaaS), provide a simple programming model of stateless functions which are a natural substrate for implementing the stateless RPC handlers of microservices, as an alternative to containerized RPC servers. However, current serverless platforms have millisecond-scale runtime overheads, making them unable to meet the strict sub-millisecond latency targets required by existing interactive microservices.We present Nightcore, a serverless function runtime with microsecond-scale overheads that provides container-based isolation between functions. Nightcore’s design carefully considers various factors having microsecond-scale overheads, including scheduling of function requests, communication primitives, threading models for I/O, and concurrent function executions. Nightcore currently supports serverless functions written in C/C++, Go, Node.js, and Python. Our evaluation shows that when running latency-sensitive interactive microservices, Nightcore achieves 1.36×–2.93× higher throughput and up to 69% reduction in tail latency.

[ASPLOS '21] FaasCache: keeping serverless computing alive with greedy-dual caching

  dblp   doi

Abstract

Functions as a Service (also called serverless computing) promises to revolutionize how applications use cloud resources. However, functions suffer from cold-start problems due to the overhead of initializing their code and data dependencies before they can start executing. Keeping functions alive and warm after they have finished execution can alleviate the cold-start overhead. Keep-alive policies must keep functions alive based on their resource and usage characteristics, which is challenging due to the diversity in FaaS workloads.Our insight is that keep-alive is analogous to caching. Our caching-inspired Greedy-Dual keep-alive policy can be effective in reducing the cold-start overhead by more than 3× compared to current approaches. Caching concepts such as reuse distances and hit-ratio curves can also be used for auto-scaled server resource provisioning, which can reduce the resource requirement of FaaS providers by 30% for real-world dynamic workloads. We implement caching-based keep-alive and resource provisioning policies in our FaasCache system, which is based on OpenWhisk. We hope that our caching analogy opens the door to more principled and optimized keep-alive and resource provisioning techniques for future FaaS workloads and platforms.

[ASPLOS '21] Benchmarking, analysis, and optimization of serverless function snapshots

  dblp   doi

Abstract

Serverless computing has seen rapid adoption due to its high scalability and flexible, pay-as-you-go billing model. In serverless, developers structure their services as a collection of functions, sporadically invoked by various events like clicks. High inter-arrival time variability of function invocations motivates the providers to start new function instances upon each invocation, leading to significant cold-start delays that degrade user experience. To reduce cold-start latency, the industry has turned to snapshotting, whereby an image of a fully-booted function is stored on disk, enabling a faster invocation compared to booting a function from scratch.This work introduces vHive, an open-source framework for serverless experimentation with the goal of enabling researchers to study and innovate across the entire serverless stack. Using vHive, we characterize a state-of-the-art snapshot-based serverless infrastructure, based on industry-leading Containerd orchestration framework and Firecracker hypervisor technologies. We find that the execution time of a function started from a snapshot is 95% higher, on average, than when the same function is memory-resident. We show that the high latency is attributable to frequent page faults as the function's state is brought from disk into guest memory one page at a time. Our analysis further reveals that functions access the same stable working set of pages across different invocations of the same function. By leveraging this insight, we build REAP, a light-weight software mechanism for serverless hosts that records functions' stable working set of guest memory pages and proactively prefetches it from disk into memory. Compared to baseline snapshotting, REAP slashes the cold-start delays by 3.7x, on average.

[USENIX '20] Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider

  dblp   doi

Abstract

Function as a Service (FaaS) has been gaining popularity as a way to deploy computations to serverless backends in the cloud. This paradigm shifts the complexity of allocating and provisioning resources to the cloud provider, which has to provide the illusion of always-available resources (i.e., fast function invocations without cold starts) at the lowest possible resource cost. Doing so requires the provider to deeply understand the characteristics of the FaaS workload. Unfortunately, there has been little to no public information on these characteristics. Thus, in this paper, we first characterize the entire production FaaS workload of Azure Functions. We show for example that most functions are invoked very infrequently, but there is an 8-order-of-magnitude range of invocation frequencies. Using observations from our characterization, we then propose a practical resource management policy that significantly reduces the number of function cold starts, while spending fewer resources than state-of-the-practice policies.

[USENIX '20] Faasm: Lightweight Isolation for Efficient Stateful Serverless Computing

  dblp   doi

Abstract

Serverless computing is an excellent fit for big data processing because it can scale quickly and cheaply to thousands of parallel functions. Existing serverless platforms isolate functions in ephemeral, stateless containers, preventing them from directly sharing memory. This forces users to duplicate and serialise data repeatedly, adding unnecessary performance and resource costs. We believe that a new lightweight isolation approach is needed, which supports sharing memory directly between functions and reduces resource overheads. We introduce Faaslets, a new isolation abstraction for high-performance serverless computing. Faaslets isolate the memory of executed functions using \emph{software-fault isolation} (SFI), as provided by WebAssembly, while allowing memory regions to be shared between functions in the same address space. Faaslets can thus avoid expensive data movement when functions are co-located on the same machine. Our runtime for Faaslets, Faasm, isolates other resources, e.g. CPU and network, using standard Linux cgroups, and provides a low-level POSIX host interface for networking, file system access and dynamic loading. To reduce initialisation times, Faasm restores Faaslets from already-initialised snapshots. We compare Faasm to a standard container-based platform and show that, when training a machine learning model, it achieves a 2× speed-up with 10× less memory; for serving machine learning inference, Faasm doubles the throughput and reduces tail latency by 90%.

[SC '20] Batch: machine learning inference serving on serverless platforms with adaptive batching

  dblp   doi

Abstract

Serverless computing is a new pay-per-use cloud service paradigm that automates resource scaling for stateless functions and can potentially facilitate bursty machine learning serving. Batching is critical for latency performance and cost-effectiveness of machine learning inference, but unfortunately it is not supported by existing serverless platforms due to their stateless design. Our experiments show that without batching, machine learning serving cannot reap the benefits of serverless computing. In this paper, we present BATCH, a framework for supporting efficient machine learning serving on serverless platforms. BATCH uses an optimizer to provide inference tail latency guarantees and cost optimization and to enable adaptive batching support. We prototype BATCH atop of AWS Lambda and popular machine learning inference systems. The evaluation verifies the accuracy of the analytic optimizer and demonstrates performance and cost advantages over the state-of-the-art method MArk and the state-of-the-practice tool SageMaker.

[OSDI '20] Fault-tolerant and transactional stateful serverless workflows

  dblp   doi

Abstract

This paper introduces Beldi, a library and runtime system for writing and composing fault-tolerant and transactional stateful serverless functions. Beldi runs on existing providers and lets developers write complex stateful applications that require fault tolerance and transactional semantics without the need to deal with tasks such as load balancing or maintaining virtual machines. Beldi’s contributions include extending the log-based fault-tolerant approach in Olive (OSDI 2016) with new data structures, transaction protocols, function invocations, and garbage collection. They also include adapting the resulting framework to work over a federated environment where each serverless function has sovereignty over its own data. We implement three applications on Beldi, including a movie review service, a travel reservation system, and a social media site. Our evaluation on 1,000 AWS Lambdas shows that Beldi’s approach is effective and affordable.

[NSDI '20] Firecracker: Lightweight Virtualization for Serverless Applications

  dblp   doi

Abstract

Serverless containers and functions are widely used for deploying and managing software in the cloud. Their popularity is due to reduced cost of operations, improved utilization of hardware, and faster scaling than traditional deployment methods. The economics and scale of serverless applications demand that workloads from multiple customers run on the same hardware with minimal overhead, while preserving strong security and performance isolation. The traditional view is that there is a choice between virtualization with strong security and high overhead, and container technologies with weaker security and minimal overhead. This tradeoff is unacceptable to public infrastructure providers, who need both strong security and minimal overhead. To meet this need, we developed Firecracker, a new open source Virtual Machine Monitor (VMM) specialized for serverless workloads, but generally useful for containers, functions and other compute workloads within a reasonable set of constraints. We have deployed Firecracker in two publically-available serverless compute services at AWS (Lambda and Fargate), where it supports millions of production workloads, and trillions of requests per month. We describe how specializing for serverless informed the design of Firecracker, and what we learned from seamlessly migrating AWS Lambda customers to Firecracker.

[INFOCOM '20] COSE: Configuring Serverless Functions using Statistical Learning

  dblp   doi

Abstract

Serverless computing has emerged as a new compelling paradigm for the deployment of applications and services. It represents an evolution of cloud computing with a simplified programming model, that aims to abstract away most operational concerns. Running serverless functions requires users to configure multiple parameters, such as memory, CPU, cloud provider, etc. While relatively simpler, configuring such parameters correctly while minimizing cost and meeting delay constraints is not trivial. In this paper, we present COSE, a framework that uses Bayesian Optimization to find the optimal configuration for serverless functions. COSE uses statistical learning techniques to intelligently collect samples and predict the cost and execution time of a serverless function across unseen configuration values. Our framework uses the predicted cost and execution time, to select the "best" configuration parameters for running a single or a chain of functions, while satisfying customer objectives. In addition, COSE has the ability to adapt to changes in the execution time of a serverless function. We evaluate COSE not only on a commercial cloud provider, where we successfully found optimal/near-optimal configurations in as few as five samples, but also over a wide range of simulated distributed cloud environments that confirm the efficacy of our approach.

[FAST '20] InfiniCache: Exploiting Ephemeral Serverless Functions to Build a Cost-Effective Memory Cache

  dblp   doi

Abstract

Internet-scale web applications are becoming increasingly storage-intensive and rely heavily on in-memory object caching to attain required I/O performance. We argue that the emerging serverless computing paradigm provides a well-suited, cost-effective platform for object caching. We present InfiniCache, a first-of-its-kind in-memory object caching system that is completely built and deployed atop ephemeral serverless functions. InfiniCache exploits and orchestrates serverless functions' memory resources to enable elastic pay-per-use caching. InfiniCache's design combines erasure coding, intelligent billed duration control, and an efficient data backup mechanism to maximize data availability and cost-effectiveness while balancing the risk of losing cached state and performance. We implement InfiniCache on AWS Lambda and show that it: (1) achieves 31 – 96× tenant-side cost savings compared to AWS ElastiCache for a large-object-only production workload, (2) can effectively provide 95.4% data availability for each one hour window, and (3) enables comparative performance seen in a typical in-memory cache.

[EUROSYS '20] A fault-tolerance shim for serverless computing

  dblp   doi

Abstract

Serverless computing has grown in popularity in recent years, with an increasing number of applications being built on Functions-as-a-Service (FaaS) platforms. By default, FaaS platforms support retry-based fault tolerance, but this is insufficient for programs that modify shared state, as they can unwittingly persist partial sets of updates in case of failures. To address this challenge, we would like atomic visibility of the updates made by a FaaS application.In this paper, we present aft, an atomic fault tolerance shim for serverless applications. aft interposes between a commodity FaaS platform and storage engine and ensures atomic visibility of updates by enforcing the read atomic isolation guarantee. aft supports new protocols to guarantee read atomic isolation in the serverless setting. We demonstrate that aft introduces minimal overhead relative to existing storage engines and scales smoothly to thousands of requests per second, while preventing a significant number of consistency anomalies.

[EUROSYS '20] SEUSS: skip redundant paths to make serverless fast

  dblp   doi

Abstract

This paper presents a system-level method for achieving the rapid deployment and high-density caching of serverless functions in a FaaS environment. For reduced start times, functions are deployed from unikernel snapshots, bypassing expensive initialization steps. To reduce the memory footprint of snapshots we apply page-level sharing across the entire software stack that is required to run a function. We demonstrate the effects of our techniques by replacing Linux on the compute node of a FaaS platform architecture. With our prototype OS, the deployment time of a function drops from 100s of milliseconds to under 10 ms. Platform throughput improves by 51x on workload composed entirely of new functions. We are able to cache over 50,000 function instances in memory as opposed to 3,000 using standard OS techniques. In combination, these improvements give the FaaS platform a new ability to handle large-scale bursts of requests.

[ASPLOS '20] Catalyzer: Sub-millisecond Startup for Serverless Computing with Initialization-less Booting

  dblp   doi

Abstract

Serverless computing promises cost-efficiency and elasticity for high-productive software development. To achieve this, the serverless sandbox system must address two challenges: strong isolation between function instances, and low startup latency to ensure user experience. While strong isolation can be provided by virtualization-based sandboxes, the initialization of sandbox and application causes non-negligible startup overhead. Conventional sandbox systems fall short in low-latency startup due to their application-agnostic nature: they can only reduce the latency of sandbox initialization through hypervisor and guest kernel customization, which is inadequate and does not mitigate the majority of startup overhead.This paper proposes Catalyzer, a serverless sandbox system design providing both strong isolation and extremely fast function startup. Instead of booting from scratch, Catalyzer restores a virtualization-based function instance from a well-formed checkpoint image and thereby skips the initialization on the critical path (init-less). Catalyzer boosts the restore performance by on-demand recovering both user-level memory state and system state. We also propose a new OS primitive, sfork (sandbox fork), to further reduce the startup latency by directly reusing the state of a running sandbox instance. Fundamentally, Catalyzer removes the initialization cost by reusing state, which enables general optimizations for diverse serverless functions. The evaluation shows that Catalyzer reduces startup latency by orders of magnitude, achieves < 1ms latency in the best case, and significantly reduces the end-to-end latency for real-world workloads. Catalyzer has been adopted by Ant Financial, and we also present lessons learned from industrial development.

[NSDI '19] Shuffling, Fast and Slow: Scalable Analytics on Serverless Infrastructure

  dblp   doi

Abstract

Serverless computing is poised to fulfill the long-held promise of transparent elasticity and millisecond-level pricing. To achieve this goal, service providers impose a finegrained computational model where every function has a maximum duration, a fixed amount of memory and no persistent local storage. We observe that the fine-grained elasticity of serverless is key to achieve high utilization for general computations such as analytics workloads, but that resource limits make it challenging to implement such applications as they need to move large amounts of data between functions that don’t overlap in time. In this paper, we present Locus, a serverless analytics system that judiciously combines (1) cheap but slow storage with (2) fast but expensive storage, to achieve good performance while remaining cost-efficient. Locus applies a performance model to guide users in selecting the type and the amount of storage to achieve the desired cost-performance trade-off. We evaluate Locus on a number of analytics applications including TPC-DS, CloudSort, Big Data Benchmark and show that Locus can navigate the cost-performance trade-off, leading to 4×-500× performance improvements over slow storage-only baseline and reducing resource usage by up to 59% while achieving comparable performance on a cluster of virtual machines, and within 1.99× slower compared to Redshift.

[INFOCOM '19] Distributed Machine Learning with a Serverless Architecture

  dblp   doi

Abstract

The need to scale up machine learning, in the presence of a rapid growth of data both in volume and in variety, has sparked broad interests to develop distributed machine learning systems, typically based on parameter servers. However, since these systems are based on a dedicated cluster of physical or virtual machines, they have posed non-trivial cluster management overhead to machine learning practitioners and data scientists. In addition, there exists an inherent mismatch between the dynamically varying resource demands during a model training job and the inflexible resource provisioning model of current cluster-based systems.In this paper, we propose SIREN, an asynchronous distributed machine learning framework based on the emerging serverless architecture, with which stateless functions can be executed in the cloud without the complexity of building and maintaining virtual machine infrastructures. With SIREN, we are able to achieve a higher level of parallelism and elasticity by using a swarm of stateless functions, each working on a different batch of data, while greatly reducing system configuration overhead. Furthermore, we propose a scheduler based on Deep Reinforcement Learning to dynamically control the number and memory size of the stateless functions that should be used in each training epoch. The scheduler learns from the training process itself, in pursuit for the minimum possible training time given a cost. With our real-world prototype implementation on AWS Lambda, extensive experimental results have shown that SIREN can reduce model training time by up to 44%, as compared to traditional machine learning training benchmarks on AWS EC2 at the same cost.

[USENIX '18] SOCK: Rapid Task Provisioning with Serverless-Optimized Containers

  dblp   doi

Abstract

Serverless computing promises to provide applications with cost savings and extreme elasticity. Unfortunately, slow application and container initialization can hurt common-case latency on serverless platforms. In this work, we analyze Linux container primitives, identifying scalability bottlenecks related to storage and network isolation. We also analyze Python applications from GitHub and show that importing many popular libraries adds about 100ms to startup. Based on these findings, we implement SOCK, a container system optimized for serverless workloads. Careful avoidance of kernel scalability bottlenecks gives SOCK an 18x speedup over Docker. A generalized-Zygote provisioning strategy yields an additional 3x speedup. A more sophisticated three-tier caching strategy based on Zygotes provides a 45x speedup over SOCK without Zygotes. Relative to AWS Lambda and OpenWhisk, OpenLambda with SOCK reduces platform overheads by 2.8x and 5.3x respectively in an image processing case study.

[USENIX '18] Peeking Behind the Curtains of Serverless Platforms

  dblp   doi

Abstract

Serverless computing is an emerging paradigm in which an application's resource provisioning and scaling are managed by third-party services. Examples include AWS Lambda, Azure Functions, and Google Cloud Functions. Behind these services' easy-to-use APIs are opaque, complex infrastructure and management ecosystems. Taking on the viewpoint of a serverless customer, we conduct the largest measurement study to date, launching more than 50,000 function instances across these three services, in order to characterize their architectures, performance, and resource management efficiency. We explain how the platforms isolate the functions of different accounts, using either virtual machines or containers, which has important security implications. We characterize performance in terms of scalability, coldstart latency, and resource efficiency, with highlights including that AWS Lambda adopts a bin-packing-like strategy to maximize VM memory utilization, that severe contention between functions can arise in AWS and Azure, and that Google had bugs that allow customers to use resources for free.

[USENIX '18] Understanding Ephemeral Storage for Serverless Analytics

  dblp   doi

Abstract

Serverless computing frameworks allow users to launch thousands of concurrent tasks with high elasticity and fine-grain resource billing without explicitly managing computing resources. While already successful for IoT and web microservices, there is increasing interest in leveraging serverless computing to run data-intensive jobs, such as interactive analytics. A key challenge in running analytics workloads on serverless platforms is enabling tasks in different execution stages to efficiently communicate data between each other via a shared data store. In this paper, we explore the suitability of different cloud storage services (e.g., object stores and distributed caches) as remote storage for serverless analytics. Our analysis leads to key insights to guide the design of an ephemeral cloud storage system, including the performance and cost efficiency of Flash storage for serverless application requirements and the need for a pay-what-you-use storage service that can support the high throughput demands of highly parallel applications.

[USENIX '18] SAND: Towards High-Performance Serverless Computing

  dblp   doi

Abstract

Serverless computing has emerged as a new cloud computing paradigm, where an application consists of individual functions that can be separately managed and executed. However, existing serverless platforms normally isolate and execute functions in separate containers, and do not exploit the interactions among functions for performance. These practices lead to high startup delays for function executions and inefficient resource usage. This paper presents SAND, a new serverless computing system that provides lower latency, better resource efficiency and more elasticity than existing serverless platforms. To achieve these properties, SAND introduces two key techniques: 1) application-level sandboxing, and 2) a hierarchical message bus. We have implemented and deployed a complete SAND system. Our results show that SAND outperforms the state-of-the-art serverless platforms significantly. For example, in a commonly-used image processing application, SAND achieves a 43% speedup compared to Apache OpenWhisk.

[OSDI '18] Pocket: Elastic Ephemeral Storage for Serverless Analytics

  dblp   doi

Abstract

Serverless computing is becoming increasingly popular, enabling users to quickly launch thousands of shortlived tasks in the cloud with high elasticity and fine-grain billing. These properties make serverless computing appealing for interactive data analytics. However exchanging intermediate data between execution stages in an analytics job is a key challenge as direct communication between serverless tasks is difficult. The natural approach is to store such ephemeral data in a remote data store. However, existing storage systems are not designed to meet the demands of serverless applications in terms of elasticity, performance, and cost. We present Pocket, an elastic, distributed data store that automatically scales to provide applications with desired performance at low cost. Pocket dynamically rightsizes resources across multiple dimensions (CPU cores, network bandwidth, storage capacity) and leverages multiple storage technologies to minimize cost while ensuring applications are not bottlenecked on I/O. We show that Pocket achieves similar performance to ElastiCache Redis for serverless analytics applications while reducing cost by almost 60%.