Building a Serverless Antivirus Scanner for S3 - A Journey from Lambda Layers to Container Images

January 11, 2025 · 19 min read

CEO at Autohost.ai

Ever tried to fit an elephant into a suitcase? That's what it felt like when we first attempted to create a serverless virus scanning solution for S3 uploads. Our journey took us through Lambda layers, container images, and some interesting discoveries about the limitations of serverless computing. Here's our story.

The Initial Challenge

Our requirements seemed straightforward: scan files uploaded to S3 for viruses using ClamAV, the trusted open-source antivirus engine. However, making this work in a serverless environment proved to be an interesting architectural challenge that taught us valuable lessons about Lambda's limitations and the power of container images.

Early Architectural Decisions

Before diving into implementation, we had to make several key architectural decisions:

Synchronous vs Asynchronous Scanning
- Should we block uploads until scanning completes?
- How to handle large files that exceed Lambda's timeout?
- What about files that trigger multiple S3 events?
State Management
- Where to store scan results?
- How to handle scan retries?
- Should we maintain scan history?
Security Considerations
- How to handle infected files?
- Should we automatically quarantine or delete them?
- Who gets notified when viruses are found?

First Iteration: The Lambda Layer Approach

Our initial solution attempted to package ClamAV into a Lambda layer. The approach seemed logical - layers are designed for sharing code and dependencies across functions. However, we quickly hit our first roadblock:

$ du -sh clamav/
372M clamav/

Lambda layers have a 250MB size limit when unzipped. After some optimization work (stripping debug symbols, removing man pages, and excluding virus definitions), we managed to get under the limit. But this led to our next challenge: how to handle the virus definitions.

The Definition Distribution Problem

ClamAV's virus definitions are large (approximately 200MB) and need frequent updates. Our solution was to create a separate Lambda function that would:

Download fresh definitions using freshclam
Upload them to S3
Allow our scanner function to download them as needed

While this worked technically, we encountered two significant issues:

Performance Impact:
- Downloading 200MB of definitions for each scan was inefficient
- Cold starts were painfully slow (15-20 seconds)
- S3 transfer costs became significant at scale
- Lambda execution time was mostly spent on I/O
Runtime Environment:
- ClamAV requires specific user/group configurations
- Permissions issues were hard to debug in Lambda
- Temp directory size limits caused occasional failures

Hidden Costs and Scaling Issues

Our Layer approach revealed several hidden costs:

S3 GET requests for definitions
Inter-region data transfer fees
Extended Lambda execution time
Additional storage for definitions

At scale, these costs added up significantly:

1,000 scans/day = ~200GB of definition downloads
Each scan took 20-30 seconds
Cold starts impacted user experience

The Container Evolution

After evaluating our options, we realized that Lambda container images would solve both our major pain points:

No Size Limits:
- Container images can be up to 10GB
- Room for optimization tools and utilities
- Space for multiple definition sets
Custom Runtime:
- Full control over the environment
- Proper user/group configuration
- Custom security policies
Prebaked Definitions:
- Definitions included in the image
- No runtime downloads needed
- Faster cold starts

Container Challenges

However, containers brought their own challenges:

Image Size Management:
- Base image selection impacts cold start
- Layer caching strategy is crucial
- Need to balance size vs functionality
Build Pipeline Complexity:
- Daily rebuilds for fresh definitions
- Cache management for faster builds
- Version control for rollbacks
Cost Considerations:
- ECR storage costs
- Image push/pull bandwidth
- Build pipeline execution

The Final Architecture

Our production solution leverages several AWS services to create a fully automated virus scanning pipeline:

EventBridge triggers nightly builds
CodePipeline orchestrates the process
CodeBuild creates fresh Docker images with:
- Latest ClamAV version
- Fresh virus definitions
- Proper runtime configuration
ECR stores our images
Lambda performs the actual scanning

Here's a visualization of our file scanning flow:

The Code

Here are the files that make up the project.

Project Files

Create a new file called package.json in the root of your project and add the following content:

{
  "name": "autohost-antivirus",
  "version": "1.0.0",
  "description": "Scan files for viruses on S3",
  "scripts": {
    "test": "jest",
    "build": "tsc",
    "run": "tsc src/scanner.ts && node src/scanner.js",
    "deploy": "serverless deploy --verbose --stage prod"
  },
  "author": "Roy Firestein",
  "dependencies": {
    "@aws-sdk/client-codepipeline": "^3.726.1",
    "@aws-sdk/client-s3": "^3.717.0"
  },
  "devDependencies": {
    "@babel/preset-env": "^7.26.0",
    "@babel/preset-typescript": "^7.26.0",
    "@tsconfig/recommended": "^1.0.8",
    "@types/aws-lambda": "^8.10.146",
    "@types/jest": "^29.5.14",
    "@types/node": "^22.10.2",
    "aws-sdk-client-mock": "^4.1.0",
    "babel-jest": "^29.7.0",
    "jest": "^29.7.0",
    "serverless": "^3.40.0",
    "ts-jest": "^29.2.5",
    "typescript": "^5.7.2"
  },
  "engines": {
    "node": ">=20"
  }
}

Now, install the dependencies:

npm install

info

Make sure you have Node.js 20 installed.

Create a new file called Dockerfile in the root of your project and add the following content:

FROM public.ecr.aws/lambda/nodejs:20 AS deps

# Install build dependencies
RUN dnf update -y && \
    dnf install -y \
    sudo tar gzip git \
    cmake gcc gcc-c++ make \
    openssl-devel pcre2-devel bzip2-devel zlib-devel xz-devel \
    libxml2-devel json-c-devel libcurl-devel ncurses-devel \
    pkgconfig zip wget shadow-utils && \
    dnf clean all

# Install Rust in a separate layer
RUN curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh -s -- -y --profile minimal && \
    . $HOME/.cargo/env && \
    rustup default stable && \
    rustup update

FROM deps AS builder

ARG CLAMAV_VERSION=1.4.1
ARG FRESHCLAM_CONF_SHA256="default"

# Build ClamAV - only rebuild if version or config changes
RUN --mount=type=cache,target=/tmp/clamav-cache \
    . $HOME/.cargo/env && \
    cd /tmp && \
    curl -LO https://www.clamav.net/downloads/production/clamav-${CLAMAV_VERSION}.tar.gz && \
    tar xzf clamav-${CLAMAV_VERSION}.tar.gz && \
    cd clamav-${CLAMAV_VERSION} && \
    mkdir build && \
    cd build && \
    cmake \
      -D CMAKE_INSTALL_PREFIX=/opt/clamav \
      -D CMAKE_BUILD_TYPE=Release \
      -D ENABLE_MILTER=OFF \
      -D ENABLE_UNRAR=OFF \
      -D ENABLE_TESTS=OFF \
      .. && \
    make -j$(nproc) && \
    make install

# Build Node.js dependencies in a separate layer
COPY package*.json ${LAMBDA_TASK_ROOT}/
RUN --mount=type=cache,target=/root/.npm \
    npm ci

# Final stage
FROM public.ecr.aws/lambda/nodejs:20

# Install runtime dependencies
RUN dnf update -y && \
    dnf install -y \
    passwd \
    sudo \
    openssl \
    pcre2 \
    zlib \
    bzip2 \
    xz \
    libxml2 \
    json-c \
    ncurses \
    shadow-utils \
    util-linux \
    zip && \
    dnf clean all

# Copy Node.js dependencies and source code
COPY --from=builder ${LAMBDA_TASK_ROOT}/node_modules ${LAMBDA_TASK_ROOT}/node_modules
COPY . ${LAMBDA_TASK_ROOT}/

# Transform TypeScript to JavaScript
RUN npx tsc ${LAMBDA_TASK_ROOT}/src/scanner.ts

# Copy ClamAV build artifacts
COPY --from=builder /opt/clamav /opt/clamav

# Setup ClamAV configuration and directories
COPY freshclam.conf /opt/clamav/etc/freshclam.conf

# Create clamav user and set up directories
RUN export PATH="/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:$PATH" && \
    groupadd -r clamav && \
    useradd -r -g clamav -d /var/lib/clamav -s /sbin/nologin clamav && \
    echo "clamav ALL=(ALL) NOPASSWD: /opt/clamav/bin/freshclam" >> /etc/sudoers && \
    mkdir -p /opt/clamav/share/clamav && \
    mkdir -p /opt/clamav/etc && \
    mkdir -p /var/log/clamav && \
    mkdir -p /var/lib/clamav && \
    mkdir -p /var/run/clamav && \
    chown -R clamav:clamav /opt/clamav/share/clamav && \
    chown -R clamav:clamav /var/lib/clamav && \
    chown -R clamav:clamav /var/log/clamav && \
    chown -R clamav:clamav /var/run/clamav && \
    chown clamav:clamav /opt/clamav/etc/freshclam.conf && \
    chmod 755 /opt/clamav/share/clamav && \
    chmod 755 /var/lib/clamav && \
    chmod 755 /var/log/clamav && \
    chmod 755 /var/run/clamav

# Set environment variables
ENV PATH="/opt/clamav/bin:/opt/clamav/sbin:${PATH}" \
    LD_LIBRARY_PATH="/opt/clamav/lib:${LD_LIBRARY_PATH:-}"

# Set locale
ENV LANG=en_US.UTF-8

# Update virus definitions if files are not present or older than 1 day
RUN if [ ! -f /opt/clamav/definitions/daily.cvd ] || [ $(find /opt/clamav/definitions/daily.cvd -mtime +1) ]; then \
    freshclam --datadir=/opt/clamav/definitions; \
    fi

# Use Lambda's CMD
CMD [ "src/scanner.handler" ]

Create a new file called serverless.yml in the root of your project and add the following content:

service: autohost-antivirus

package:
  excludeDevDependencies: true
  individually: false
  exclude:
    - .git/**
    - .vscode/**
    - src/__tests__/**
    - clamav-*.zip

custom:
  # Do not change these values
  region: ${opt:region, 'us-east-1'}
  stage: ${opt:stage, 'prod'}
  prefix: ${self:service}-${self:custom.stage}
  
  #
  # Change these values to match your project
  #

  # Your S3 bucket name
  bucket: "YOUR_BUCKET_NAME"

  # Your service name for tagging
  service: "antivirus"

  # The version of ClamAV to use
  clamavVersion: "1.4.1"

  # Your GitHub owner
  githubOwner: "AutohostAI"

  # Your GitHub repository
  githubRepo: "samples"

  # The path to the antivirus code in your repository (relative to the root of the repository).
  # This is useful if you have a monorepo and want to scan a specific subdirectory.
  githubRepoPath: "."

  # The branch to use
  githubBranch: "master"

provider:
  name: aws
  runtime: nodejs20.x
  architecture: x86_64
  stage: ${self:custom.stage}
  region: ${self:custom.region}
  logRetentionInDays: 30
  stackTags:
    service: ${self:custom.service}
    ENV: ${opt:stage, 'dev'}
    Environment: ${opt:stage, 'dev'}
  tags:
    service: ${self:custom.service}
    Environment: ${opt:stage, 'dev'}
  timeout: 180
  memorySize: 2048
  versionFunctions: false

functions:
  scanner:
    description: "Scan S3 objects for viruses"
    image: ${aws:accountId}.dkr.ecr.${self:provider.region}.amazonaws.com/${self:custom.prefix}-clamav:latest
    role: ScannerRole
    reservedConcurrency: 1
    command:
      - src/scanner.handler
    events:
      - s3:
          bucket: ${self:custom.bucket}
          event: s3:ObjectCreated:*
          existing: true
          rules:
            - prefix: userdata/uploads/

# Infrastructure (CloudFormation)
resources:
  Description: "Virus Scanning for S3"
  Resources:
    ScannerRole:
      Type: AWS::IAM::Role
      Properties:
        RoleName: ${self:custom.prefix}-scanner-role
        AssumeRolePolicyDocument:
          Version: "2012-10-17"
          Statement:
            - Effect: Allow
              Principal:
                Service: lambda.amazonaws.com
              Action: sts:AssumeRole
        Policies:
          - PolicyName: "ScannerPolicy"
            PolicyDocument:
              Version: "2012-10-17"
              Statement:
                - Effect: Allow
                  Action:
                    - logs:CreateLogGroup
                    - logs:CreateLogStream
                    - logs:PutLogEvents
                    - logs:TagResource
                  Resource:
                    - 'Fn::Join':
                        - ':'
                        - - 'arn:aws:logs'
                          - Ref: 'AWS::Region'
                          - Ref: 'AWS::AccountId'
                          - 'log-group:/aws/lambda/*:*:*'
                - Effect: Allow
                  Action:
                    - s3:GetObject
                    - s3:GetObjectTagging
                    - s3:GetObjectVersion
                    - s3:PutObjectTagging
                    - s3:PutObjectVersionTagging
                  Resource:
                    # The path to the uploads directory in your S3 bucket
                    # Change this to match your bucket structure
                    - "arn:aws:s3:::${self:custom.bucket}/userdata/uploads/*"
                - Effect: Allow
                  Action:
                    - s3:ListBucket
                  Resource:
                    - "arn:aws:s3:::${self:custom.bucket}"
                - Effect: Allow
                  Action:
                    - ecr:GetDownloadUrlForLayer
                    - ecr:BatchGetImage
                    - ecr:GetAuthorizationToken
                  Resource: !GetAtt ClamAVRepository.Arn
                - Effect: Allow
                  Action:
                    - codepipeline:PutJobSuccessResult
                    - codepipeline:PutJobFailureResult
                  Resource: "*"

    # ECR Repository
    ClamAVRepository:
      Type: AWS::ECR::Repository
      Properties:
        RepositoryName: ${self:custom.prefix}-clamav
        ImageScanningConfiguration:
          ScanOnPush: true
        LifecyclePolicy:
          LifecyclePolicyText: |
            {
              "rules": [
                {
                  "rulePriority": 1,
                  "description": "Keep only last 5 images",
                  "selection": {
                    "tagStatus": "any",
                    "countType": "imageCountMoreThan",
                    "countNumber": 5
                  },
                  "action": {
                    "type": "expire"
                  }
                }
              ]
            }

    # CodeStar Connection for GitHub
    CodeStarConnection:
      Type: AWS::CodeStarConnections::Connection
      Properties:
        ConnectionName: ${self:custom.prefix}-github
        ProviderType: GitHub

    # CodeBuild Project
    ClamAVBuildProject:
      Type: AWS::CodeBuild::Project
      Properties:
        Name: ${self:custom.prefix}-clamav-build
        ServiceRole: !GetAtt CodeBuildServiceRole.Arn
        Artifacts:
          Type: CODEPIPELINE
        Environment:
          Type: LINUX_CONTAINER
          ComputeType: BUILD_GENERAL1_SMALL
          Image: aws/codebuild/amazonlinux2-x86_64-standard:5.0
          PrivilegedMode: true
          EnvironmentVariables:
            - Name: ECR_REPOSITORY_URI
              Value: !Sub ${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/${self:custom.prefix}-clamav
            - Name: IMAGE_TAG
              Value: latest
            - Name: CLAMAV_VERSION
              Value: ${self:custom.clamavVersion}
            - Name: REPO_PATH
              Value: ${self:custom.githubRepoPath}
            - Name: REGION
              Value: ${self:custom.region}
        Source:
          Type: CODEPIPELINE
          BuildSpec: |
            version: 0.2
            phases:
              pre_build:
                commands:
                  - aws ecr get-login-password --region $REGION | docker login --username AWS --password-stdin $ECR_REPOSITORY_URI
                  - docker pull $ECR_REPOSITORY_URI:$IMAGE_TAG || true
                  - docker pull $ECR_REPOSITORY_URI:builder || true
                  - cd $REPO_PATH || true
                  # Calculate freshclam config hash for cache busting
                  - FRESHCLAM_CONF_SHA256=$(sha256sum freshclam.conf | cut -d' ' -f1)
              build:
                commands:
                  - DOCKER_BUILDKIT=1 docker build --build-arg CLAMAV_VERSION=$CLAMAV_VERSION --cache-from $ECR_REPOSITORY_URI:$IMAGE_TAG --cache-from $ECR_REPOSITORY_URI:builder -t $ECR_REPOSITORY_URI:$IMAGE_TAG -t $ECR_REPOSITORY_URI:builder .
              post_build:
                commands:
                  - docker push $ECR_REPOSITORY_URI:$IMAGE_TAG
                  - docker push $ECR_REPOSITORY_URI:builder
                  - cd $CODEBUILD_SRC_DIR
                  - printf '{"ImageURI":"%s"}' $ECR_REPOSITORY_URI:$IMAGE_TAG > imageDetail.json
            artifacts:
              files:
                - imageDetail.json
        TimeoutInMinutes: 30
        QueuedTimeoutInMinutes: 30

    # Lambda function to update container image
    UpdateLambdaFunction:
      Type: AWS::Lambda::Function
      Properties:
        FunctionName: ${self:custom.prefix}-update-function
        Handler: index.handler
        Runtime: nodejs20.x
        Timeout: 30
        MemorySize: 128
        Role: !GetAtt UpdateLambdaRole.Arn
        Layers:
          - !Sub arn:aws:lambda:${AWS::Region}:${AWS::AccountId}:layer:adm-zip:1
        Code:
          ZipFile: |
            const { S3Client, GetObjectCommand } = require('@aws-sdk/client-s3');
            const { LambdaClient, UpdateFunctionCodeCommand } = require('@aws-sdk/client-lambda');
            const { CodePipelineClient, PutJobSuccessResultCommand, PutJobFailureResultCommand } = require('@aws-sdk/client-codepipeline');
            const AdmZip = require('adm-zip');
            
            const s3Client = new S3Client();
            const lambdaClient = new LambdaClient();
            const codePipelineClient = new CodePipelineClient();
            
            exports.handler = async (event) => {
              console.log('Event:', JSON.stringify(event, null, 2));
              
              try {
                const artifactPath = event['CodePipeline.job'].data.inputArtifacts[0].location.s3Location;
                const bucket = artifactPath.bucketName;
                const key = artifactPath.objectKey;
                
                // Get zip file from S3
                const response = await s3Client.send(new GetObjectCommand({ 
                  Bucket: bucket, 
                  Key: key 
                }));
                
                // Convert stream to buffer
                const chunks = [];
                for await (const chunk of response.Body) {
                  chunks.push(chunk);
                }
                const zipBuffer = Buffer.concat(chunks);
                
                // Extract imageDetail.json from zip
                const zip = new AdmZip(zipBuffer);
                const imageDetailEntry = zip.getEntry('imageDetail.json');
                if (!imageDetailEntry) {
                  throw new Error('imageDetail.json not found in artifact');
                }
                
                const imageDetail = JSON.parse(imageDetailEntry.getData().toString('utf8'));
                console.log('Image details:', imageDetail);
                
                // Update Lambda function
                await lambdaClient.send(new UpdateFunctionCodeCommand({
                  FunctionName: '${self:service}-${self:custom.stage}-scanner',
                  ImageUri: imageDetail.ImageURI
                }));
                
                // Report success
                await codePipelineClient.send(new PutJobSuccessResultCommand({
                  jobId: event['CodePipeline.job'].id
                }));
                
                return {
                  statusCode: 200,
                  body: 'Function updated successfully'
                };
              } catch (error) {
                console.error('Error:', error);
                
                // Report failure
                await codePipelineClient.send(new PutJobFailureResultCommand({
                  jobId: event['CodePipeline.job'].id,
                  failureDetails: {
                    type: 'JobFailed',
                    message: error.message
                  }
                }));
                
                throw error;
              }
            }

    # IAM role for the update function
    UpdateLambdaRole:
      Type: AWS::IAM::Role
      Properties:
        RoleName: ${self:custom.prefix}-update-function-role
        AssumeRolePolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Principal:
                Service: lambda.amazonaws.com
              Action: sts:AssumeRole
        ManagedPolicyArns:
          - arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole
        Policies:
          - PolicyName: UpdateFunctionPolicy
            PolicyDocument:
              Version: '2012-10-17'
              Statement:
                - Effect: Allow
                  Action:
                    - lambda:UpdateFunctionCode
                  Resource: !GetAtt ScannerLambdaFunction.Arn
                - Effect: Allow
                  Action:
                    - s3:GetObject
                  Resource: !Sub ${PipelineArtifactBucket.Arn}/*
                - Effect: Allow
                  Action:
                    - codepipeline:PutJobSuccessResult
                    - codepipeline:PutJobFailureResult
                  Resource: "*"
          - PolicyName: CloudWatchLogsPolicy
            PolicyDocument:
              Version: '2012-10-17'
              Statement:
                - Effect: Allow
                  Action:
                    - logs:CreateLogGroup
                    - logs:CreateLogStream
                    - logs:PutLogEvents
                  Resource: "*"

    # Update CodePipeline Deploy stage
    ClamAVPipeline:
      Type: AWS::CodePipeline::Pipeline
      Properties:
        Name: ${self:custom.prefix}-clamav-pipeline
        RoleArn: !GetAtt CodePipelineServiceRole.Arn
        ArtifactStore:
          Type: S3
          Location: !Ref PipelineArtifactBucket
          EncryptionKey: 
            Id: alias/aws/s3
            Type: KMS
        Stages:
          - Name: Source
            Actions:
              - Name: Source
                ActionTypeId:
                  Category: Source
                  Owner: AWS
                  Provider: CodeStarSourceConnection
                  Version: '1'
                Configuration:
                  ConnectionArn: !Ref CodeStarConnection
                  FullRepositoryId: ${self:custom.githubOwner}/${self:custom.githubRepo}
                  BranchName: ${self:custom.githubBranch}
                  DetectChanges: true
                OutputArtifacts:
                  - Name: SourceOutput
                RunOrder: 1
          - Name: Build
            Actions:
              - Name: BuildImage
                ActionTypeId:
                  Category: Build
                  Owner: AWS
                  Provider: CodeBuild
                  Version: '1'
                Configuration:
                  ProjectName: !Ref ClamAVBuildProject
                InputArtifacts:
                  - Name: SourceOutput
                OutputArtifacts:
                  - Name: BuildOutput
                RunOrder: 1
          - Name: Deploy
            Actions:
              - Name: UpdateFunction
                ActionTypeId:
                  Category: Invoke
                  Owner: AWS
                  Provider: Lambda
                  Version: '1'
                Configuration:
                  FunctionName: !Ref UpdateLambdaFunction
                InputArtifacts:
                  - Name: BuildOutput
                RunOrder: 1

    # Pipeline Artifact Bucket
    PipelineArtifactBucket:
      Type: AWS::S3::Bucket
      Properties:
        BucketName: ${self:custom.prefix}-pipeline-artifacts
        VersioningConfiguration:
          Status: Enabled
        PublicAccessBlockConfiguration:
          BlockPublicAcls: true
          BlockPublicPolicy: true
          IgnorePublicAcls: true
          RestrictPublicBuckets: true
        LifecycleConfiguration:
          Rules:
            - Id: DeleteOldArtifacts
              Status: Enabled
              ExpirationInDays: 30
        BucketEncryption:
          ServerSideEncryptionConfiguration:
            - ServerSideEncryptionByDefault:
                SSEAlgorithm: AES256

    # IAM Roles
    CodeBuildServiceRole:
      Type: AWS::IAM::Role
      Properties:
        RoleName: ${self:custom.prefix}-codebuild-role
        AssumeRolePolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Principal:
                Service: codebuild.amazonaws.com
              Action: sts:AssumeRole
        Policies:
          - PolicyName: CodeBuildBasePolicy
            PolicyDocument:
              Version: '2012-10-17'
              Statement:
                - Effect: Allow
                  Action:
                    - logs:CreateLogGroup
                    - logs:CreateLogStream
                    - logs:PutLogEvents
                  Resource:
                    - !Sub arn:aws:logs:${AWS::Region}:${AWS::AccountId}:log-group:/aws/codebuild/${self:custom.prefix}-clamav-build:*
                - Effect: Allow
                  Action:
                    - s3:GetObject
                    - s3:GetObjectVersion
                    - s3:PutObject
                  Resource:
                    - !Sub ${PipelineArtifactBucket.Arn}/*
                - Effect: Allow
                  Action:
                    - ecr:GetAuthorizationToken
                  Resource: "*"
                - Effect: Allow
                  Action:
                    - ecr:BatchCheckLayerAvailability
                    - ecr:GetDownloadUrlForLayer
                    - ecr:GetRepositoryPolicy
                    - ecr:DescribeRepositories
                    - ecr:ListImages
                    - ecr:DescribeImages
                    - ecr:BatchGetImage
                    - ecr:InitiateLayerUpload
                    - ecr:UploadLayerPart
                    - ecr:CompleteLayerUpload
                    - ecr:PutImage
                  Resource: !GetAtt ClamAVRepository.Arn
                - Effect: Allow
                  Action:
                    - lambda:InvokeFunction
                  Resource: !GetAtt UpdateLambdaFunction.Arn

    CodePipelineServiceRole:
      Type: AWS::IAM::Role
      Properties:
        RoleName: ${self:custom.prefix}-codepipeline-role
        AssumeRolePolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Principal:
                Service: codepipeline.amazonaws.com
              Action: sts:AssumeRole
        Policies:
          - PolicyName: CodePipelineAccess
            PolicyDocument:
              Version: '2012-10-17'
              Statement:
                - Effect: Allow
                  Action:
                    - s3:GetObject
                    - s3:GetObjectVersion
                    - s3:GetBucketVersioning
                    - s3:PutObject
                    - s3:ListBucket
                  Resource:
                    - !GetAtt PipelineArtifactBucket.Arn
                    - !Sub ${PipelineArtifactBucket.Arn}/*
                - Effect: Allow
                  Action:
                    - codebuild:BatchGetBuilds
                    - codebuild:StartBuild
                  Resource: !GetAtt ClamAVBuildProject.Arn
                - Effect: Allow
                  Action:
                    - lambda:InvokeFunction
                  Resource: !GetAtt UpdateLambdaFunction.Arn
                - Effect: Allow
                  Action:
                    - codestar-connections:UseConnection
                  Resource: !Ref CodeStarConnection
                - Effect: Allow
                  Action:
                    - iam:PassRole
                  Resource: '*'
                  Condition:
                    StringEquals:
                      iam:PassedToService:
                        - codebuild.amazonaws.com
                        - lambda.amazonaws.com

    # EventBridge Rule for nightly builds
    NightlyBuildRule:
      Type: AWS::Events::Rule
      Properties:
        Name: ${self:custom.prefix}-nightly-build
        Description: "Trigger ClamAV build pipeline nightly"
        ScheduleExpression: "cron(0 0 * * ? *)"
        State: ENABLED
        Targets:
          - Arn: !Sub arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:${ClamAVPipeline}
            Id: NightlyBuildTarget
            RoleArn: !GetAtt EventBridgeRole.Arn

    EventBridgeRole:
      Type: AWS::IAM::Role
      Properties:
        RoleName: ${self:custom.prefix}-eventbridge-role
        AssumeRolePolicyDocument:
          Version: '2012-10-17'
          Statement:
            - Effect: Allow
              Principal:
                Service: events.amazonaws.com
              Action: sts:AssumeRole
        Policies:
          - PolicyName: StartPipeline
            PolicyDocument:
              Version: '2012-10-17'
              Statement:
                - Effect: Allow
                  Action: codepipeline:StartPipelineExecution
                  Resource: !Sub arn:aws:codepipeline:${AWS::Region}:${AWS::AccountId}:${ClamAVPipeline}

Create a new file called freshclam.conf in the root of your project and add the following content:

# Database mirror
DatabaseMirror database.clamav.net

# Database directory
DatabaseDirectory /var/lib/clamav

# Update log file
UpdateLogFile /var/log/clamav/freshclam.log

# Log time
LogTime yes

# PID file
PidFile /var/run/clamav/freshclam.pid

# Database owner
DatabaseOwner clamav

# Log file size limit
LogFileMaxSize 2M 

Create a new file called scanner.ts in the src directory and add the following content:

import {
  S3Client,
  GetObjectCommand,
  PutObjectTaggingCommand,
} from "@aws-sdk/client-s3";
import {
  CodePipelineClient,
  PutJobSuccessResultCommand,
  PutJobFailureResultCommand,
} from "@aws-sdk/client-codepipeline";
import { S3Event } from "aws-lambda";
import { execFile } from "child_process";
import { promisify } from "util";
import * as fs from "fs";
import * as path from "path";
import { Readable } from "stream";

const execFileAsync = promisify(execFile);
const s3Client = new S3Client({
  region: process.env.AWS_REGION,
});
const codePipelineClient = new CodePipelineClient({
  region: process.env.AWS_REGION,
});

// Maximum file size to scan (100MB)
const MAX_FILE_SIZE = 100 * 1024 * 1024;

// Path to ClamAV definitions in the container
const CLAMAV_DEFINITIONS_PATH = "/opt/clamav/definitions";

/**
 * Lambda function handler that processes both S3 events and CodePipeline job notifications
 * @param event - Either an S3Event or CodePipeline job notification
 */
export async function handler(event: any) {
  // Check if this is a CodePipeline job notification
  if (event.CodePipeline?.job) {
    const jobId = event.CodePipeline.job.id;
    try {
      // For CodePipeline jobs, we just need to acknowledge success
      // The actual update is handled by Lambda's image configuration
      await codePipelineClient.send(
        new PutJobSuccessResultCommand({
          jobId,
        })
      );
      console.log("Successfully notified CodePipeline of job completion");
    } catch (error) {
      console.error("Failed to notify CodePipeline:", error);
      await codePipelineClient.send(
        new PutJobFailureResultCommand({
          jobId,
          failureDetails: {
            message: error instanceof Error ? error.message : "Unknown error",
            type: "JobFailed",
          },
        })
      );
    }
    return;
  }

  // Handle S3 events
  const s3Event = event as S3Event;
  // 1) For each S3 record (object created event)
  for (const record of s3Event.Records) {
    const bucketName = record.s3.bucket.name;
    const objectKey = decodeURIComponent(
      record.s3.object.key.replace(/\+/g, " ")
    );
    const objectSize = record.s3.object.size;

    console.log(`Processing object: ${bucketName}/${objectKey}`);

    // Check file size
    if (objectSize > MAX_FILE_SIZE) {
      console.warn(`Object too large to scan: ${objectSize} bytes`);
      await tagObject(bucketName, objectKey, "TOO_LARGE");
      continue;
    }

    try {
      // 2) Download the object to /tmp
      const localObjectPath = path.join("/tmp", path.basename(objectKey));
      await downloadFile(bucketName, objectKey, localObjectPath);

      try {
        // 3) Run clamscan with container-bundled definitions
        console.log("Running virus scan...");
        let scanResult;
        try {
          scanResult = await execFileAsync("/opt/clamav/bin/clamscan", [
            `--database=${CLAMAV_DEFINITIONS_PATH}`,
            localObjectPath,
          ]);
        } catch (error: any) {
          // ClamAV returns exit code 1 when a virus is found
          // This is expected behavior, not an error
          if (error.code === 1) {
            scanResult = {
              stdout: error.stdout,
              stderr: error.stderr,
            };
          } else {
            // Real error occurred
            throw error;
          }
        }

        console.log("clamscan output:", scanResult.stdout);
        if (scanResult.stderr)
          console.error("clamscan errors:", scanResult.stderr);

        // Check if the output indicates a virus
        const status = scanResult.stdout.includes("FOUND")
          ? "INFECTED"
          : "CLEAN";
        console.log(`Scan result for ${objectKey}: ${status}`);

        // 4) Tag the S3 object with the scan result
        await tagObject(bucketName, objectKey, status);
      } finally {
        // Clean up scanned file
        if (fs.existsSync(localObjectPath)) {
          try {
            fs.unlinkSync(localObjectPath);
          } catch (error) {
            console.error(`Error deleting file ${localObjectPath}:`, error);
          }
        }
      }
    } catch (error) {
      console.error(`Error processing ${objectKey}:`, error);
      await tagObject(bucketName, objectKey, "ERROR");
    }
  }
}

/**
 * Tags an S3 object with the scan result.
 */
async function tagObject(
  bucket: string,
  key: string,
  status: string
): Promise<void> {
  try {
    const command = new PutObjectTaggingCommand({
      Bucket: bucket,
      Key: key,
      Tagging: {
        TagSet: [
          {
            Key: "av-status",
            Value: status,
          },
          {
            Key: "av-timestamp",
            Value: new Date().toISOString(),
          },
        ],
      },
    });

    await s3Client.send(command);
  } catch (error) {
    console.error(`Error tagging object ${bucket}/${key}:`, error);
    throw error;
  }
}

/**
 * Downloads an S3 object to a local path.
 */
async function downloadFile(
  bucket: string,
  key: string,
  localPath: string
): Promise<void> {
  try {
    const command = new GetObjectCommand({
      Bucket: bucket,
      Key: key,
    });

    const response = await s3Client.send(command);

    if (!response.Body) {
      throw new Error(`Empty response body for ${bucket}/${key}`);
    }

    const body = response.Body as Readable;
    const writeStream = fs.createWriteStream(localPath);

    return new Promise((resolve, reject) => {
      body.pipe(writeStream);
      body.on("error", reject);
      writeStream.on("finish", resolve);
      writeStream.on("error", reject);
    });
  } catch (error) {
    console.error(`Error downloading ${bucket}/${key}:`, error);
    throw error;
  }
}

Now we need to build a Lambda layer for providing for the function that will run during the CodePipeline job. Run the following commands to build the layer:

# Create layer directory
mkdir -p lambda-layers/adm-zip/nodejs
cd lambda-layers/adm-zip/nodejs

# Initialize package.json and install dependencies
npm init -y
npm install adm-zip @types/adm-zip

# Create layer zip
cd ..
zip -r adm-zip.zip nodejs/

# Create the layer in AWS (Note: Use the same region as your deployment)
aws lambda publish-layer-version \
  --layer-name adm-zip \
  --description "AdmZip for creating zip files" \
  --license-info "MIT" \
  --zip-file fileb://adm-zip.zip \
  --compatible-runtimes nodejs20.x \
  --compatible-architectures x86_64 arm64 \
  --region <your-region>

# Clean up
cd ../..
rm -rf lambda-layers

Finally, deploy the Serverless stack:

npx serverless deploy --verbose --stage prod

This will:

Create all necessary IAM roles and policies
Set up the ECR repository
Configure CodeBuild and CodePipeline
Deploy the Lambda functions
Create the EventBridge rule for nightly builds

tip

After deployment:

Accept the CodeStar connection in the AWS Console (first time only)
The first build will start automatically after the CodeStar connection is established

Key Learnings

Consider the Full Picture: While Lambda layers are excellent for code sharing, they're not always the best solution for complex runtime dependencies.
Leverage Container Benefits: Container images provide more flexibility and control over the runtime environment, making them ideal for complex applications.
Automate Everything: Our nightly build process ensures we always have fresh virus definitions without manual intervention.
Think About Scale: The initial solution of downloading definitions per scan wouldn't scale well. Sometimes, a more complex architecture can actually be more efficient.

Results and Benefits

Our final solution provides:

Automatic daily updates of virus definitions
Proper runtime environment configuration
Efficient scanning without repeated downloads
Zero maintenance overhead
Complete infrastructure as code deployment

Conclusion

Building a serverless virus scanner taught us valuable lessons about serverless architecture and its limitations. While our initial Lambda layer approach seemed simpler, the container-based solution proved more robust and maintainable in the long run.

Remember: the simplest solution isn't always the best one. Sometimes, embracing a bit more complexity in your architecture can lead to a more elegant and maintainable solution.

Future Improvements

We're considering several enhancements:

Multi-region deployment for reduced latency
Enhanced quarantine mechanisms
Machine learning for improved detection
Real-time threat intelligence integration
Customized virus definitions for specific use cases

Autohost

Talk to Sales

The Initial Challenge​

Early Architectural Decisions​

First Iteration: The Lambda Layer Approach​

The Definition Distribution Problem​

Hidden Costs and Scaling Issues​

The Container Evolution​

Container Challenges​

The Final Architecture​

The Code​

Key Learnings​

Results and Benefits​

Conclusion​

Future Improvements​