Building a Serverless Antivirus Scanner for S3 - A Journey from Lambda Layers to Container Images
Ever tried to fit an elephant into a suitcase? That's what it felt like when we first attempted to create a serverless virus scanning solution for S3 uploads. Our journey took us through Lambda layers, container images, and some interesting discoveries about the limitations of serverless computing. Here's our story.
The Initial Challenge
Our requirements seemed straightforward: scan files uploaded to S3 for viruses using ClamAV, the trusted open-source antivirus engine. However, making this work in a serverless environment proved to be an interesting architectural challenge that taught us valuable lessons about Lambda's limitations and the power of container images.
Early Architectural Decisions
Before diving into implementation, we had to make several key architectural decisions:
-
Synchronous vs Asynchronous Scanning
- Should we block uploads until scanning completes?
- How to handle large files that exceed Lambda's timeout?
- What about files that trigger multiple S3 events?
-
State Management
- Where to store scan results?
- How to handle scan retries?
- Should we maintain scan history?
-
Security Considerations
- How to handle infected files?
- Should we automatically quarantine or delete them?
- Who gets notified when viruses are found?
First Iteration: The Lambda Layer Approach
Our initial solution attempted to package ClamAV into a Lambda layer. The approach seemed logical - layers are designed for sharing code and dependencies across functions. However, we quickly hit our first roadblock:
$ du -sh clamav/
372M clamav/
Lambda layers have a 250MB size limit when unzipped. After some optimization work (stripping debug symbols, removing man pages, and excluding virus definitions), we managed to get under the limit. But this led to our next challenge: how to handle the virus definitions.
The Definition Distribution Problem
ClamAV's virus definitions are large (approximately 200MB) and need frequent updates. Our solution was to create a separate Lambda function that would:
- Download fresh definitions using
freshclam
- Upload them to S3
- Allow our scanner function to download them as needed
While this worked technically, we encountered two significant issues:
-
Performance Impact:
- Downloading 200MB of definitions for each scan was inefficient
- Cold starts were painfully slow (15-20 seconds)
- S3 transfer costs became significant at scale
- Lambda execution time was mostly spent on I/O
-
Runtime Environment:
- ClamAV requires specific user/group configurations
- Permissions issues were hard to debug in Lambda
- Temp directory size limits caused occasional failures
Hidden Costs and Scaling Issues
Our Layer approach revealed several hidden costs:
- S3 GET requests for definitions
- Inter-region data transfer fees
- Extended Lambda execution time
- Additional storage for definitions
At scale, these costs added up significantly:
- 1,000 scans/day = ~200GB of definition downloads
- Each scan took 20-30 seconds
- Cold starts impacted user experience
The Container Evolution
After evaluating our options, we realized that Lambda container images would solve both our major pain points:
-
No Size Limits:
- Container images can be up to 10GB
- Room for optimization tools and utilities
- Space for multiple definition sets
-
Custom Runtime:
- Full control over the environment
- Proper user/group configuration
- Custom security policies
-
Prebaked Definitions:
- Definitions included in the image
- No runtime downloads needed
- Faster cold starts
Container Challenges
However, containers brought their own challenges:
-
Image Size Management:
- Base image selection impacts cold start
- Layer caching strategy is crucial
- Need to balance size vs functionality
-
Build Pipeline Complexity:
- Daily rebuilds for fresh definitions
- Cache management for faster builds
- Version control for rollbacks
-
Cost Considerations:
- ECR storage costs
- Image push/pull bandwidth
- Build pipeline execution
The Final Architecture
Our production solution leverages several AWS services to create a fully automated virus scanning pipeline:
- EventBridge triggers nightly builds
- CodePipeline orchestrates the process
- CodeBuild creates fresh Docker images with:
- Latest ClamAV version
- Fresh virus definitions
- Proper runtime configuration
- ECR stores our images
- Lambda performs the actual scanning
Here's a visualization of our file scanning flow:
The Code
Here are the files that make up the project.
Project Files
Create a new file called package.json
in the root of your project and add the following content:
{
"name": "autohost-antivirus",
"version": "1.0.0",
"description": "Scan files for viruses on S3",
"scripts": {
"test": "jest",
"build": "tsc",
"run": "tsc src/scanner.ts && node src/scanner.js",
"deploy": "serverless deploy --verbose --stage prod"
},
"author": "Roy Firestein",
"dependencies": {
"@aws-sdk/client-codepipeline": "^3.726.1",
"@aws-sdk/client-s3": "^3.717.0"
},
"devDependencies": {
"@babel/preset-env": "^7.26.0",
"@babel/preset-typescript": "^7.26.0",
"@tsconfig/recommended": "^1.0.8",
"@types/aws-lambda": "^8.10.146",
"@types/jest": "^29.5.14",
"@types/node": "^22.10.2",
"aws-sdk-client-mock": "^4.1.0",
"babel-jest": "^29.7.0",
"jest": "^29.7.0",
"serverless": "^3.40.0",
"ts-jest": "^29.2.5",
"typescript": "^5.7.2"
},
"engines": {
"node": ">=20"
}
}
Now, install the dependencies:
npm install
Make sure you have Node.js 20 installed.
Create a new file called Dockerfile
in the root of your project and add the following content:
FROM public.ecr.aws/lambda/nodejs:20 AS deps
# Install build dependencies
RUN dnf update -y && \
dnf install -y \
sudo tar gzip git \
cmake gcc gcc-c++ make \
openssl-devel pcre2-devel bzip2-devel zlib-devel xz-devel \
libxml2-devel json-c-devel libcurl-devel ncurses-devel \
pkgconfig zip wget shadow-utils && \
dnf clean all
# Install Rust in a separate layer
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal && \
. $HOME/.cargo/env && \
rustup default stable && \
rustup update
FROM deps AS builder
ARG CLAMAV_VERSION=1.4.1
ARG FRESHCLAM_CONF_SHA256="default"
# Build ClamAV - only rebuild if version or config changes
RUN --mount=type=cache,target=/tmp/clamav-cache \
. $HOME/.cargo/env && \
cd /tmp && \
curl -LO https://www.clamav.net/downloads/production/clamav-${CLAMAV_VERSION}.tar.gz && \
tar xzf clamav-${CLAMAV_VERSION}.tar.gz && \
cd clamav-${CLAMAV_VERSION} && \
mkdir build && \
cd build && \
cmake \
-D CMAKE_INSTALL_PREFIX=/opt/clamav \
-D CMAKE_BUILD_TYPE=Release \
-D ENABLE_MILTER=OFF \
-D ENABLE_UNRAR=OFF \
-D ENABLE_TESTS=OFF \
.. && \
make -j$(nproc) && \
make install
# Build Node.js dependencies in a separate layer
COPY package*.json ${LAMBDA_TASK_ROOT}/
RUN --mount=type=cache,target=/root/.npm \
npm ci
# Final stage
FROM public.ecr.aws/lambda/nodejs:20
# Install runtime dependencies
RUN dnf update -y && \
dnf install -y \
passwd \
sudo \
openssl \
pcre2 \
zlib \
bzip2 \
xz \
libxml2 \
json-c \
ncurses \
shadow-utils \
util-linux \
zip && \
dnf clean all
# Copy Node.js dependencies and source code
COPY --from=builder ${LAMBDA_TASK_ROOT}/node_modules ${LAMBDA_TASK_ROOT}/node_modules
COPY . ${LAMBDA_TASK_ROOT}/
# Transform TypeScript to JavaScript
RUN npx tsc ${LAMBDA_TASK_ROOT}/src/scanner.ts
# Copy ClamAV build artifacts
COPY --from=builder /opt/clamav /opt/clamav
# Setup ClamAV configuration and directories
COPY freshclam.conf /opt/clamav/etc/freshclam.conf
# Create clamav user and set up directories
RUN export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH" && \
groupadd -r clamav && \
useradd -r -g clamav -d /var/lib/clamav -s /sbin/nologin clamav && \
echo "clamav ALL=(ALL) NOPASSWD: /opt/clamav/bin/freshclam" >> /etc/sudoers && \
mkdir -p /opt/clamav/share/clamav && \
mkdir -p /opt/clamav/etc && \
mkdir -p /var/log/clamav && \
mkdir -p /var/lib/clamav && \
mkdir -p /var/run/clamav && \
chown -R clamav:clamav /opt/clamav/share/clamav && \
chown -R clamav:clamav /var/lib/clamav && \
chown -R clamav:clamav /var/log/clamav && \
chown -R clamav:clamav /var/run/clamav && \
chown clamav:clamav /opt/clamav/etc/freshclam.conf && \
chmod 755 /opt/clamav/share/clamav && \
chmod 755 /var/lib/clamav && \
chmod 755 /var/log/clamav && \
chmod 755 /var/run/clamav
# Set environment variables
ENV PATH="/opt/clamav/bin:/opt/clamav/sbin:${PATH}" \
LD_LIBRARY_PATH="/opt/clamav/lib:${LD_LIBRARY_PATH:-}"
# Set locale
ENV LANG=en_US.UTF-8
# Update virus definitions if files are not present or older than 1 day
RUN if [ ! -f /opt/clamav/definitions/daily.cvd ] || [ $(find /opt/clamav/definitions/daily.cvd -mtime +1) ]; then \
freshclam --datadir=/opt/clamav/definitions; \
fi
# Use Lambda's CMD
CMD [ "src/scanner.handler" ]
Create a new file called serverless.yml
in the root of your project and add the following content:
service: autohost-antivirus
package:
excludeDevDependencies: true
individually: false
exclude:
- .git/**
- .vscode/**
- src/__tests__/**
- clamav-*.zip
custom:
# Do not change these values
region: ${opt:region, 'us-east-1'}
stage: ${opt:stage, 'prod'}
prefix: ${self:service}-${self:custom.stage}
#
# Change these values to match your project
#
# Your S3 bucket name
bucket: "YOUR_BUCKET_NAME"
# Your service name for tagging
service: "antivirus"
# The version of ClamAV to use
clamavVersion: "1.4.1"
# Your GitHub owner
githubOwner: "AutohostAI"
# Your GitHub repository
githubRepo: "samples"
# The path to the antivirus code in your repository (relative to the root of the repository).
# This is useful if you have a monorepo and want to scan a specific subdirectory.
githubRepoPath: "."
# The branch to use
githubBranch: "master"
provider:
name: aws
runtime: nodejs20.x
architecture: x86_64
stage: ${self:custom.stage}
region: ${self:custom.region}
logRetentionInDays: 30
stackTags:
service: ${self:custom.service}
ENV: ${opt:stage, 'dev'}
Environment: ${opt:stage, 'dev'}
tags:
service: ${self:custom.service}
Environment: ${opt:stage, 'dev'}
timeout: 180
memorySize: 2048
versionFunctions: false
functions:
scanner:
description: "Scan S3 objects for viruses"
image: ${aws:accountId}.dkr.ecr.${self:provider.region}.amazonaws.com/${self:custom.prefix}-clamav:latest
role: ScannerRole
reservedConcurrency: 1
command:
- src/scanner.handler
events:
- s3:
bucket: ${self:custom.bucket}
event: s3:ObjectCreated:*
existing: true
rules:
- prefix: userdata/uploads/
# Infrastructure (CloudFormation)
resources:
Description: "Virus Scanning for S3"
Resources:
ScannerRole:
Type: AWS::IAM::Role
Properties:
RoleName: ${self:custom.prefix}-scanner-role
AssumeRolePolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: "ScannerPolicy"
PolicyDocument:
Version: "2012-10-17"
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
- logs:TagResource
Resource:
- 'Fn::Join':
- ':'
- - 'arn:aws:logs'
- Ref: 'AWS::Region'
- Ref: 'AWS::AccountId'
- 'log-group:/aws/lambda/*:*:*'
- Effect: Allow
Action:
- s3:GetObject
- s3:GetObjectTagging
- s3:GetObjectVersion
- s3:PutObjectTagging
- s3:PutObjectVersionTagging
Resource:
# The path to the uploads directory in your S3 bucket
# Change this to match your bucket structure
- "arn:aws:s3:::${self:custom.bucket}/userdata/uploads/*"
- Effect: Allow
Action:
- s3:ListBucket
Resource:
- "arn:aws:s3:::${self:custom.bucket}"
- Effect: Allow
Action:
- ecr:GetDownloadUrlForLayer
- ecr:BatchGetImage
- ecr:GetAuthorizationToken
Resource: !GetAtt ClamAVRepository.Arn
- Effect: Allow
Action:
- codepipeline:PutJobSuccessResult
- codepipeline:PutJobFailureResult
Resource: "*"
# ECR Repository
ClamAVRepository:
Type: AWS::ECR::Repository
Properties:
RepositoryName: ${self:custom.prefix}-clamav
ImageScanningConfiguration:
ScanOnPush: true
LifecyclePolicy:
LifecyclePolicyText: |
{
"rules": [
{
"rulePriority": 1,
"description": "Keep only last 5 images",
"selection": {
"tagStatus": "any",
"countType": "imageCountMoreThan",
"countNumber": 5
},
"action": {
"type": "expire"
}
}
]
}
# CodeStar Connection for GitHub
CodeStarConnection:
Type: AWS::CodeStarConnections::Connection
Properties:
ConnectionName: ${self:custom.prefix}-github
ProviderType: GitHub
# CodeBuild Project
ClamAVBuildProject:
Type: AWS::CodeBuild::Project
Properties:
Name: ${self:custom.prefix}-clamav-build
ServiceRole: !GetAtt CodeBuildServiceRole.Arn
Artifacts:
Type: CODEPIPELINE
Environment:
Type: LINUX_CONTAINER
ComputeType: BUILD_GENERAL1_SMALL
Image: aws/codebuild/amazonlinux2-x86_64-standard:5.0
PrivilegedMode: true
EnvironmentVariables:
- Name: ECR_REPOSITORY_URI
Value: !Sub ${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${self:custom.prefix}-clamav
- Name: IMAGE_TAG
Value: latest
- Name: CLAMAV_VERSION
Value: ${self:custom.clamavVersion}
- Name: REPO_PATH
Value: ${self:custom.githubRepoPath}
- Name: REGION
Value: ${self:custom.region}
Source:
Type: CODEPIPELINE
BuildSpec: |
version: 0.2
phases:
pre_build:
commands:
- aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY_URI
- docker pull $ECR_REPOSITORY_URI:$IMAGE_TAG || true
- docker pull $ECR_REPOSITORY_URI:builder || true
- cd $REPO_PATH || true
# Calculate freshclam config hash for cache busting
- FRESHCLAM_CONF_SHA256=$(sha256sum freshclam.conf | cut -d' ' -f1)
build:
commands:
- DOCKER_BUILDKIT=1 docker build --build-arg CLAMAV_VERSION=$CLAMAV_VERSION --cache-from $ECR_REPOSITORY_URI:$IMAGE_TAG --cache-from $ECR_REPOSITORY_URI:builder -t $ECR_REPOSITORY_URI:$IMAGE_TAG -t $ECR_REPOSITORY_URI:builder .
post_build:
commands:
- docker push $ECR_REPOSITORY_URI:$IMAGE_TAG
- docker push $ECR_REPOSITORY_URI:builder
- cd $CODEBUILD_SRC_DIR
- printf '{"ImageURI":"%s"}' $ECR_REPOSITORY_URI:$IMAGE_TAG > imageDetail.json
artifacts:
files:
- imageDetail.json
TimeoutInMinutes: 30
QueuedTimeoutInMinutes: 30
# Lambda function to update container image
UpdateLambdaFunction:
Type: AWS::Lambda::Function
Properties:
FunctionName: ${self:custom.prefix}-update-function
Handler: index.handler
Runtime: nodejs20.x
Timeout: 30
MemorySize: 128
Role: !GetAtt UpdateLambdaRole.Arn
Layers:
- !Sub arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:layer:adm-zip:1
Code:
ZipFile: |
const { S3Client, GetObjectCommand } = require('@aws-sdk/client-s3');
const { LambdaClient, UpdateFunctionCodeCommand } = require('@aws-sdk/client-lambda');
const { CodePipelineClient, PutJobSuccessResultCommand, PutJobFailureResultCommand } = require('@aws-sdk/client-codepipeline');
const AdmZip = require('adm-zip');
const s3Client = new S3Client();
const lambdaClient = new LambdaClient();
const codePipelineClient = new CodePipelineClient();
exports.handler = async (event) => {
console.log('Event:', JSON.stringify(event, null, 2));
try {
const artifactPath = event['CodePipeline.job'].data.inputArtifacts[0].location.s3Location;
const bucket = artifactPath.bucketName;
const key = artifactPath.objectKey;
// Get zip file from S3
const response = await s3Client.send(new GetObjectCommand({
Bucket: bucket,
Key: key
}));
// Convert stream to buffer
const chunks = [];
for await (const chunk of response.Body) {
chunks.push(chunk);
}
const zipBuffer = Buffer.concat(chunks);
// Extract imageDetail.json from zip
const zip = new AdmZip(zipBuffer);
const imageDetailEntry = zip.getEntry('imageDetail.json');
if (!imageDetailEntry) {
throw new Error('imageDetail.json not found in artifact');
}
const imageDetail = JSON.parse(imageDetailEntry.getData().toString('utf8'));
console.log('Image details:', imageDetail);
// Update Lambda function
await lambdaClient.send(new UpdateFunctionCodeCommand({
FunctionName: '${self:service}-${self:custom.stage}-scanner',
ImageUri: imageDetail.ImageURI
}));
// Report success
await codePipelineClient.send(new PutJobSuccessResultCommand({
jobId: event['CodePipeline.job'].id
}));
return {
statusCode: 200,
body: 'Function updated successfully'
};
} catch (error) {
console.error('Error:', error);
// Report failure
await codePipelineClient.send(new PutJobFailureResultCommand({
jobId: event['CodePipeline.job'].id,
failureDetails: {
type: 'JobFailed',
message: error.message
}
}));
throw error;
}
}
# IAM role for the update function
UpdateLambdaRole:
Type: AWS::IAM::Role
Properties:
RoleName: ${self:custom.prefix}-update-function-role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: lambda.amazonaws.com
Action: sts:AssumeRole
ManagedPolicyArns:
- arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
Policies:
- PolicyName: UpdateFunctionPolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- lambda:UpdateFunctionCode
Resource: !GetAtt ScannerLambdaFunction.Arn
- Effect: Allow
Action:
- s3:GetObject
Resource: !Sub ${PipelineArtifactBucket.Arn}/*
- Effect: Allow
Action:
- codepipeline:PutJobSuccessResult
- codepipeline:PutJobFailureResult
Resource: "*"
- PolicyName: CloudWatchLogsPolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource: "*"
# Update CodePipeline Deploy stage
ClamAVPipeline:
Type: AWS::CodePipeline::Pipeline
Properties:
Name: ${self:custom.prefix}-clamav-pipeline
RoleArn: !GetAtt CodePipelineServiceRole.Arn
ArtifactStore:
Type: S3
Location: !Ref PipelineArtifactBucket
EncryptionKey:
Id: alias/aws/s3
Type: KMS
Stages:
- Name: Source
Actions:
- Name: Source
ActionTypeId:
Category: Source
Owner: AWS
Provider: CodeStarSourceConnection
Version: '1'
Configuration:
ConnectionArn: !Ref CodeStarConnection
FullRepositoryId: ${self:custom.githubOwner}/${self:custom.githubRepo}
BranchName: ${self:custom.githubBranch}
DetectChanges: true
OutputArtifacts:
- Name: SourceOutput
RunOrder: 1
- Name: Build
Actions:
- Name: BuildImage
ActionTypeId:
Category: Build
Owner: AWS
Provider: CodeBuild
Version: '1'
Configuration:
ProjectName: !Ref ClamAVBuildProject
InputArtifacts:
- Name: SourceOutput
OutputArtifacts:
- Name: BuildOutput
RunOrder: 1
- Name: Deploy
Actions:
- Name: UpdateFunction
ActionTypeId:
Category: Invoke
Owner: AWS
Provider: Lambda
Version: '1'
Configuration:
FunctionName: !Ref UpdateLambdaFunction
InputArtifacts:
- Name: BuildOutput
RunOrder: 1
# Pipeline Artifact Bucket
PipelineArtifactBucket:
Type: AWS::S3::Bucket
Properties:
BucketName: ${self:custom.prefix}-pipeline-artifacts
VersioningConfiguration:
Status: Enabled
PublicAccessBlockConfiguration:
BlockPublicAcls: true
BlockPublicPolicy: true
IgnorePublicAcls: true
RestrictPublicBuckets: true
LifecycleConfiguration:
Rules:
- Id: DeleteOldArtifacts
Status: Enabled
ExpirationInDays: 30
BucketEncryption:
ServerSideEncryptionConfiguration:
- ServerSideEncryptionByDefault:
SSEAlgorithm: AES256
# IAM Roles
CodeBuildServiceRole:
Type: AWS::IAM::Role
Properties:
RoleName: ${self:custom.prefix}-codebuild-role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: codebuild.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: CodeBuildBasePolicy
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- logs:CreateLogGroup
- logs:CreateLogStream
- logs:PutLogEvents
Resource:
- !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/codebuild/${self:custom.prefix}-clamav-build:*
- Effect: Allow
Action:
- s3:GetObject
- s3:GetObjectVersion
- s3:PutObject
Resource:
- !Sub ${PipelineArtifactBucket.Arn}/*
- Effect: Allow
Action:
- ecr:GetAuthorizationToken
Resource: "*"
- Effect: Allow
Action:
- ecr:BatchCheckLayerAvailability
- ecr:GetDownloadUrlForLayer
- ecr:GetRepositoryPolicy
- ecr:DescribeRepositories
- ecr:ListImages
- ecr:DescribeImages
- ecr:BatchGetImage
- ecr:InitiateLayerUpload
- ecr:UploadLayerPart
- ecr:CompleteLayerUpload
- ecr:PutImage
Resource: !GetAtt ClamAVRepository.Arn
- Effect: Allow
Action:
- lambda:InvokeFunction
Resource: !GetAtt UpdateLambdaFunction.Arn
CodePipelineServiceRole:
Type: AWS::IAM::Role
Properties:
RoleName: ${self:custom.prefix}-codepipeline-role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: codepipeline.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: CodePipelineAccess
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action:
- s3:GetObject
- s3:GetObjectVersion
- s3:GetBucketVersioning
- s3:PutObject
- s3:ListBucket
Resource:
- !GetAtt PipelineArtifactBucket.Arn
- !Sub ${PipelineArtifactBucket.Arn}/*
- Effect: Allow
Action:
- codebuild:BatchGetBuilds
- codebuild:StartBuild
Resource: !GetAtt ClamAVBuildProject.Arn
- Effect: Allow
Action:
- lambda:InvokeFunction
Resource: !GetAtt UpdateLambdaFunction.Arn
- Effect: Allow
Action:
- codestar-connections:UseConnection
Resource: !Ref CodeStarConnection
- Effect: Allow
Action:
- iam:PassRole
Resource: '*'
Condition:
StringEquals:
iam:PassedToService:
- codebuild.amazonaws.com
- lambda.amazonaws.com
# EventBridge Rule for nightly builds
NightlyBuildRule:
Type: AWS::Events::Rule
Properties:
Name: ${self:custom.prefix}-nightly-build
Description: "Trigger ClamAV build pipeline nightly"
ScheduleExpression: "cron(0 0 * * ? *)"
State: ENABLED
Targets:
- Arn: !Sub arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:${ClamAVPipeline}
Id: NightlyBuildTarget
RoleArn: !GetAtt EventBridgeRole.Arn
EventBridgeRole:
Type: AWS::IAM::Role
Properties:
RoleName: ${self:custom.prefix}-eventbridge-role
AssumeRolePolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Principal:
Service: events.amazonaws.com
Action: sts:AssumeRole
Policies:
- PolicyName: StartPipeline
PolicyDocument:
Version: '2012-10-17'
Statement:
- Effect: Allow
Action: codepipeline:StartPipelineExecution
Resource: !Sub arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:${ClamAVPipeline}
Create a new file called freshclam.conf
in the root of your project and add the following content:
# Database mirror
DatabaseMirror database.clamav.net
# Database directory
DatabaseDirectory /var/lib/clamav
# Update log file
UpdateLogFile /var/log/clamav/freshclam.log
# Log time
LogTime yes
# PID file
PidFile /var/run/clamav/freshclam.pid
# Database owner
DatabaseOwner clamav
# Log file size limit
LogFileMaxSize 2M
Create a new file called scanner.ts
in the src
directory and add the following content:
import {
S3Client,
GetObjectCommand,
PutObjectTaggingCommand,
} from "@aws-sdk/client-s3";
import {
CodePipelineClient,
PutJobSuccessResultCommand,
PutJobFailureResultCommand,
} from "@aws-sdk/client-codepipeline";
import { S3Event } from "aws-lambda";
import { execFile } from "child_process";
import { promisify } from "util";
import * as fs from "fs";
import * as path from "path";
import { Readable } from "stream";
const execFileAsync = promisify(execFile);
const s3Client = new S3Client({
region: process.env.AWS_REGION,
});
const codePipelineClient = new CodePipelineClient({
region: process.env.AWS_REGION,
});
// Maximum file size to scan (100MB)
const MAX_FILE_SIZE = 100 * 1024 * 1024;
// Path to ClamAV definitions in the container
const CLAMAV_DEFINITIONS_PATH = "/opt/clamav/definitions";
/**
* Lambda function handler that processes both S3 events and CodePipeline job notifications
* @param event - Either an S3Event or CodePipeline job notification
*/
export async function handler(event: any) {
// Check if this is a CodePipeline job notification
if (event.CodePipeline?.job) {
const jobId = event.CodePipeline.job.id;
try {
// For CodePipeline jobs, we just need to acknowledge success
// The actual update is handled by Lambda's image configuration
await codePipelineClient.send(
new PutJobSuccessResultCommand({
jobId,
})
);
console.log("Successfully notified CodePipeline of job completion");
} catch (error) {
console.error("Failed to notify CodePipeline:", error);
await codePipelineClient.send(
new PutJobFailureResultCommand({
jobId,
failureDetails: {
message: error instanceof Error ? error.message : "Unknown error",
type: "JobFailed",
},
})
);
}
return;
}
// Handle S3 events
const s3Event = event as S3Event;
// 1) For each S3 record (object created event)
for (const record of s3Event.Records) {
const bucketName = record.s3.bucket.name;
const objectKey = decodeURIComponent(
record.s3.object.key.replace(/\+/g, " ")
);
const objectSize = record.s3.object.size;
console.log(`Processing object: ${bucketName}/${objectKey}`);
// Check file size
if (objectSize > MAX_FILE_SIZE) {
console.warn(`Object too large to scan: ${objectSize} bytes`);
await tagObject(bucketName, objectKey, "TOO_LARGE");
continue;
}
try {
// 2) Download the object to /tmp
const localObjectPath = path.join("/tmp", path.basename(objectKey));
await downloadFile(bucketName, objectKey, localObjectPath);
try {
// 3) Run clamscan with container-bundled definitions
console.log("Running virus scan...");
let scanResult;
try {
scanResult = await execFileAsync("/opt/clamav/bin/clamscan", [
`--database=${CLAMAV_DEFINITIONS_PATH}`,
localObjectPath,
]);
} catch (error: any) {
// ClamAV returns exit code 1 when a virus is found
// This is expected behavior, not an error
if (error.code === 1) {
scanResult = {
stdout: error.stdout,
stderr: error.stderr,
};
} else {
// Real error occurred
throw error;
}
}
console.log("clamscan output:", scanResult.stdout);
if (scanResult.stderr)
console.error("clamscan errors:", scanResult.stderr);
// Check if the output indicates a virus
const status = scanResult.stdout.includes("FOUND")
? "INFECTED"
: "CLEAN";
console.log(`Scan result for ${objectKey}: ${status}`);
// 4) Tag the S3 object with the scan result
await tagObject(bucketName, objectKey, status);
} finally {
// Clean up scanned file
if (fs.existsSync(localObjectPath)) {
try {
fs.unlinkSync(localObjectPath);
} catch (error) {
console.error(`Error deleting file ${localObjectPath}:`, error);
}
}
}
} catch (error) {
console.error(`Error processing ${objectKey}:`, error);
await tagObject(bucketName, objectKey, "ERROR");
}
}
}
/**
* Tags an S3 object with the scan result.
*/
async function tagObject(
bucket: string,
key: string,
status: string
): Promise<void> {
try {
const command = new PutObjectTaggingCommand({
Bucket: bucket,
Key: key,
Tagging: {
TagSet: [
{
Key: "av-status",
Value: status,
},
{
Key: "av-timestamp",
Value: new Date().toISOString(),
},
],
},
});
await s3Client.send(command);
} catch (error) {
console.error(`Error tagging object ${bucket}/${key}:`, error);
throw error;
}
}
/**
* Downloads an S3 object to a local path.
*/
async function downloadFile(
bucket: string,
key: string,
localPath: string
): Promise<void> {
try {
const command = new GetObjectCommand({
Bucket: bucket,
Key: key,
});
const response = await s3Client.send(command);
if (!response.Body) {
throw new Error(`Empty response body for ${bucket}/${key}`);
}
const body = response.Body as Readable;
const writeStream = fs.createWriteStream(localPath);
return new Promise((resolve, reject) => {
body.pipe(writeStream);
body.on("error", reject);
writeStream.on("finish", resolve);
writeStream.on("error", reject);
});
} catch (error) {
console.error(`Error downloading ${bucket}/${key}:`, error);
throw error;
}
}
Now we need to build a Lambda layer for providing for the function that will run during the CodePipeline job. Run the following commands to build the layer:
# Create layer directory
mkdir -p lambda-layers/adm-zip/nodejs
cd lambda-layers/adm-zip/nodejs
# Initialize package.json and install dependencies
npm init -y
npm install adm-zip @types/adm-zip
# Create layer zip
cd ..
zip -r adm-zip.zip nodejs/
# Create the layer in AWS (Note: Use the same region as your deployment)
aws lambda publish-layer-version \
--layer-name adm-zip \
--description "AdmZip for creating zip files" \
--license-info "MIT" \
--zip-file fileb://adm-zip.zip \
--compatible-runtimes nodejs20.x \
--compatible-architectures x86_64 arm64 \
--region <your-region>
# Clean up
cd ../..
rm -rf lambda-layers
Finally, deploy the Serverless stack:
npx serverless deploy --verbose --stage prod
This will:
- Create all necessary IAM roles and policies
- Set up the ECR repository
- Configure CodeBuild and CodePipeline
- Deploy the Lambda functions
- Create the EventBridge rule for nightly builds
After deployment:
- Accept the CodeStar connection in the AWS Console (first time only)
- The first build will start automatically after the CodeStar connection is established
Key Learnings
-
Consider the Full Picture: While Lambda layers are excellent for code sharing, they're not always the best solution for complex runtime dependencies.
-
Leverage Container Benefits: Container images provide more flexibility and control over the runtime environment, making them ideal for complex applications.
-
Automate Everything: Our nightly build process ensures we always have fresh virus definitions without manual intervention.
-
Think About Scale: The initial solution of downloading definitions per scan wouldn't scale well. Sometimes, a more complex architecture can actually be more efficient.
Results and Benefits
Our final solution provides:
- Automatic daily updates of virus definitions
- Proper runtime environment configuration
- Efficient scanning without repeated downloads
- Zero maintenance overhead
- Complete infrastructure as code deployment
Conclusion
Building a serverless virus scanner taught us valuable lessons about serverless architecture and its limitations. While our initial Lambda layer approach seemed simpler, the container-based solution proved more robust and maintainable in the long run.
Remember: the simplest solution isn't always the best one. Sometimes, embracing a bit more complexity in your architecture can lead to a more elegant and maintainable solution.
Future Improvements
We're considering several enhancements:
- Multi-region deployment for reduced latency
- Enhanced quarantine mechanisms
- Machine learning for improved detection
- Real-time threat intelligence integration
- Customized virus definitions for specific use cases