Containerizing a Running Application with AWS App2Container

Now that we have gone through containerizing an already existing application where you have access to the source code, let’s look at containerizing a .NET application in a different way. This is for those applications you may have that are running and where you may not have access to the source code, or you don’t deploy it, or there are other reasons where you don’t want to change the source code as we just went over earlier. Instead, you want to containerize the application by just “picking it up off its server” and moving it into a container. Up until recently, that was not a simple thing to do. However, AWS created a tool to help you do just that. Let’s look at that now.

What is AWS App2Container?

AWS App2Container is a command-line tool that is designed to help migrate .NET web applications into a container format. You can learn more about and download this tool at https://aws.amazon.com/app2container/.  It also does Java, but hey, we’re all about .NET, so we won’t talk about that anymore! You can see the process in Figure 1, but at a high level, there are five major steps.

Figure 1. How AWS App2Container works

These steps are:

1.      Inventory – This step goes through the applications running on the server looking for running applications. At the time of writing, App2Container supports ASP.NET 3.5, and greater, applications running in IIS 7.5+ on Windows.

2.      Analyze – A chosen application is analyzed in detail to identify dependencies including known cooperating processes and network port dependencies. You can also manually add any dependencies that App2Container was unable to find.

3.      Containerize – In this step, all the application artifacts discovered during the “Analyze” phase are “dockerized.”

4.      Create – This step creates the various deployment artifacts (generally as CloudFormation templates) such as ECS task or Kubernetes pod definitions.

5.      Deploy – Store the image in Amazon ECR and deploy to ECS or EKS as desired.

There are three different modes in which you can use App2Container. The first is a mode where you perform the steps on two different machines. If using this approach, App2Container must be installed on both machines. The first machine, the Server, is the machine on which the application(s) that you want to containerize is running. You will run the first two steps on the server. The second machine, the Worker, is the machine that will perform the final three steps of the process based on artifacts that you copy from the server. The second mode is when you perform all the steps on the same machine, so it basically fills both the server and worker roles. The third mode is when you run all the commands on your worker machine, connecting to the server machine using the Windows Remote Management (WinRM) protocol. This approach has the benefit of not having to install App2Container on the server, but it also means that you must have WinRM installed and running. We will not be demonstrating this mode.

App2Container is a command-line tool that has some prerequisites that must be installed before the tool will run. These prerequisites are listed below.

·         AWS CLI – must be installed on both server and worker

·         PowerShell 5.0+ – must be installed on both server and worker

·         Administrator rights – You must be running as a Windows administrator

·         Appropriate permissions – You must have AWS credentials stored on the worker machine as was discussed in the earlier articles when installing the AWS CLI.

·         Docker tools – Docker version 17.07 or later must be installed on worker

·         Windows Server OS – Your worker system must run on Windows OS versions that support containers, namely Windows Server 2016 or 2019. If working in server\worker mode, the server system must be Windows 2008+.

·         Free Space – 20-30 GB of free space should be available on both server and worker

The currently supported types of applications are

·         Simple ASP.NET applications running on a single server

·         A Windows service running on a single server

·         Complex ASP.NET applications that depend on WCF, running on a single server or multiple servers

·         Complex ASP.NET applications that depend on Windows services or processes outside of IIS, running on a single server or multiple servers

·         Complex, multi-node IIS or Windows service applications, running on a single server or multiple servers

There are also two types of applications that are not supported:

·         ASP.NET applications that use files and registries outside of IIS web application directories

·         ASP.NET applications that depend on features of a Windows operating system version prior to Windows Server Core 2016

Now that we have described App2Container as well as the .NET applications on which it will and will not work, the next step is to show how to use the tool.

Using AWS App2Container to Containerize an Application

We will first describe the application that we are going to containerize. We have installed a .NET Framework 4.7.2 application onto a Windows EC2 instance that supports containers; the AMI we used is shown in Figure 2. Please note that since EC2 regularly revises its AMIs, you may see a different Id.

Figure 2. AMI used to host the website to containerize

The application is connected to an RDS SQL Server instance for database access using Entity Framework, and the connection string is stored in the web.config file.

The next step, now that we have a running application, is to download the AWS App2Container tool. You can access the tool by going to https://aws.amazon.com/app2container/ and clicking the Download AWS App2Container button at the top of the page. This will bring you to the Install App2Container page in the documentation which has a link to download a zip file containing the App2Container installation package. Download the file and extract it to a folder on the server. If you are doing the work using the server\worker mode, then download and extract the file on both servers. After you unzip the downloaded file, you should have 5 files, one of which is another zipped file.

Open PowerShell and navigate to the folder containing App2Container. You must then run the install script.

PS C:\App2Container> .\install.ps1

You will see the script running through several checks and then present some terms and conditions text that will require you to respond with a y to continue. You will then be able to see the tool complete its installation.

The next step is to initialize and configure App2Container. If using server/worker mode, then you will need to do this on each machine. You start the initializing with the following command.

PS C:\App2Container> app2container init

It will then prompt you for a Workspace directory path for artifacts value. This is where the files from the analysis and any containerization will be stored. Click enter to accept the default value or enter a new directory. It will then ask for an Optional AWS Profile. You can click enter if you have a default profile setup or you can enter the name of the profile to use if different.

Note: It is likely that a server running the application you want to containerize does not have the appropriate profile available. If not, you can set one up by running the aws configure command to set up your CLI installation that App2Container will use to create and upload the created container.

Next, the initialization will ask you for an Optional S3 bucket for application artifacts. Providing a value in this step will result in the tool output also being copied to the provided bucket. You can click enter to use the default of “no bucket” however, at the time of this writing you must have this value configured so that it can act as storage for moving the container image into ECR. We used an S3 bucket called “prodotnetonaws-app2container”. The next initialization step is whether you wish to Report usage metrics to AWS? (Y/N). No personal or confidential information is gathered, so we recommend that you click enter to accept the default of “Y”. The following initialization prompt asks if you want to Automatically upload logs and App2Container generated artifacts on crashes and internal errors? (Y/N). We want AWS to know as soon as possible if something went wrong so we selected “y”. The last initialization prompt is asking whether to Require images to be signed using Docker Content Trust (DCT)? (Y/N). We selected the default value, “n”. The initialization will then display the path in which the artifacts will be created and stored. Figure 3 shows our installation when completed.

Figure 3. Output from running the App2Container initialization

For those of you using the server/worker mode approach, take note of the application artifact directory displayed in the last line of the command output as this will contain the artifacts that you will need to move to the worker machine. Now that the application is initialized, the next step is to take the inventory of eligible applications running on the server. You do this by issuing the following command:

PS C:\App2Container> app2container inventory

The output from this command is a JSON object collection that has one entry for each application. The output on our EC2 server is shown below:

{
     "iis-demo-site-a7b69c34": {
          "siteName": "Demo Site",
          "bindings": "http/*:8080:",
          "applicationType": "IIS"
      },
      "iis-tradeyourtools-6bc0a317": {
          "siteName": "TradeYourTools",
          "bindings": "http/*:80:",
          "applicationType": "IIS"
      }
}

As you can see, there are two applications on our server, the “Trade Your Tools” app we described earlier as well as another website “Demo Site” that is running under IIS and is bound to port 8080. The initial key is the application ID that you will need moving forward.

Note: You can only containerize one application at a time. If you wish to containerize multiple applications from the same server you will need to repeat the following steps for each one of those applications.

The next step is to analyze the specific application that you are going to containerize. You do that with the following command, replacing the application ID (APPID) in the command with your own.

PS C:\App2Container> app2container analyze --application-id APPID

You will see a lot of flashing that shows the progress output as the tool analyzes the application, and when it is complete you will get output like that shown in Figure 4.

 Figure 4. Output from running the App2Container analyze command

The primary output from this analysis is the analysis.json file that is listed in the command output. Locating and opening that file will allow you to see the information that the tool gathered about the application, much of which is a capture of the IIS configuration for the site running your application. We won’t show the contents of the file here as it is several hundred lines long, however, much of the content of this file can be edited as you see necessary.

The next steps branch depending upon whether you are using a single server or using the server/worker mode.

When containerizing on a single server

Once you are done reviewing the artifacts created from the analysis, the next step is to containerize the application. You do this with the following command

PS C:\App2Container> app2container containerize --application-id APPID

The processing in this step may take some time to run, especially if, like us, you used a free-tier low-powered machine! Once completed, you will see output like Figure 5.

Figure 5. Output from containerizing an application in App2Container

At this point, you are ready to deploy your container and can skip to the next article, “Deploying…”, if you don’t care about containerizing using server/worker mode.

When containerizing using server/worker mode

Once you are done reviewing the artifacts created from the analysis, the next step is to extract the application. This will create the archive that will need to be moved to the worker machine for containerizing. Also, the tool will upload the archive to the S3 bucket provided during initialization. Since we didn’t provide a bucket, we must manually copy the file. The command to extract the application is:

PS C:\App2Container> app2container extract --application-id APPID

This command will process, and you should get a simple “Extraction successful” message.

Returning to the artifact directory that was displayed when initializing App2Container, you will see a new zip file named with your Application ID. Copy this file to the worker server.

Once you are on the worker server and App2Container has been initialized, the next step is to containerize the content from the archive. You do that with the following command

PS C:\App2Container> app2container containerize --input-archive PathToZip

The output from this step matches the output from running the containerization on a single server and can be seen in Figure 5 above.

The next article will show how to deploy this containerized application into AWS.

Containerizing a .NET Core-based Application for AWS

In our last post in this series, we talked about Containerizing a .NET 4.x Application for deployment onto AWS, and as you may have seen it was a somewhat convoluted affair. Containerizing a .NET Core type application is much easier, because a lot of the hoops that you must leap through to manage a Windows container will not be necessary. Instead, all AWS products, as well as IDEs, will support this out the gate.

Using Visual Studio

We have already gone through adding container support using Visual Studio, and that we are doing it now using a .NET Core-based application does not change that part of the process at all. What does change, however, is the ease of getting the newly containerized application into AWS. Once the Docker file has been added, the “Publish to AWS” options when right-clicking on the project name in the Solution Explorer is greatly expanded. Since our objective is to get this application deployed to Amazon ECR, make the choice to Push Container Images to Amazon Elastic Container Registry and click the Publish button. You will see the process walk through a few steps and it will end with a message stating that the image has been successfully deployed into ECR.

Using JetBrains Rider

The process of adding a container using JetBrains Rider is very similar to the process used in Visual Studio. Open your application in Rider, right-click the project, select Add, and then Docker Support as shown in Figure 1.

Figure 1. Adding Docker Support in JetBrains Rider

This will bring up a window where you select the Target OS, in this case, Linux.  Once you have this finished you will see a Dockerfile show up in your solution. Unfortunately, the AWS Toolkit for Rider does not currently support deploying the new container image to ECR. This means that any deployment to the cloud must be done with the AWS CLI or the AWS Tools for Powershell and would be the same as the upload process used when storing a Windows container in ECR that we went over in an earlier post.

As you can see, containerizing a .NET Core based application is much easier to do as well as easier to deploy into AWS.

Containerizing a .NET Framework 4.x Application for AWS

In this post we are going to demonstrate ways in which you can containerize your applications for deployment into the cloud, the next step in minimizing resource usage and likely saving money. This article is different from the previous entries in this series because those were a discussion of containers and running them within the AWS infrastructure while this post is much more practical and based upon getting to that point from an existing non-containerized application.

Using Visual Studio

Adding container support using Visual Studio is straightforward.

Adding Docker Support

Open an old ASP.NET Framework 4.7 application or create a new one. Once open, right-click on the project name, select Add, and then Docker Support as shown in Figure 1.

Figure 1. Adding Docker Support to an application.

Your Output view, when set to showing output from Container Tools, will show multiple steps being performed, and then it should finish successfully. When completed, you will see two new files added in the Solution Explorer, Dockerfile, and a subordinate .dockerignore file. You will also see that your default Debug setting has changed to Docker. You can see both changes in Figure 2.

Figure 2. Changes in Visual Studio after adding Docker support

You can test the support by clicking the Docker button. This will build the container, run it under your local Docker Desktop, and then open your default browser. This time, rather than going to a localhost URL you will instead go to an IP address, and if you compare the IP address in the URL to your local IP you will see that they are not the same. That is because this new IP address points to the container running on your system.

Before closing the browser and stopping the debug process, you will be able to confirm that the container is running by using the Containers view in Visual Studio as shown in Figure 3.

Figure 3. Using the Containers view in Visual Studio to see the running container

You can also use Docker Desktop to view running containers. Open Docker Desktop and select Containers / Apps. This will bring you to a list of the running containers and apps, one of which will be the container that you just started as shown in Figure 4.

Figure 4. Viewing a running container in Docker Desktop

Once these steps have been completed, you are ready to save your container in ECR, just as we covered earlier in this series.

Deploying your Windows Container to ECR

However, there are some complications with this, as the AWS Toolkit for Visual Studio does not support the container deployment options we saw earlier when looking at the toolkit when working with Windows containers. Instead, we are going to use the AWS PowerShell tools to build and publish your image to ECR. At a high level, the steps are:

·         Build your application in Release mode. This is the only way that Visual Studio puts the appropriate files in the right place, namely the obj\Docker\publish subdirectory of your project directory. You can see this value called out in the last line of your Dockerfile: COPY ${source:-obj/Docker/publish} .

·         Refresh your ECR authentication token. You need this later in the process so that you can login to ECR to push the image.

·         Build the Docker image.

·         Tag the image. Creates the image tag on the repository

·         Push the image to the server. Copy the image into ECR

Let’s walk through them now. The first step is to build your application in Release mode. However, before you can do that, you will need to stop your currently running container. You can do that through either Docker Desktop or the Containers view in Visual Studio. If you do not do this, your build will fail because you will not be able to override the necessary files. Once that is completed, your Release mode build should be able to run without problem.

Next, open PowerShell and navigate to your project directory. This directory needs to be the one that contains the Docker file. First thing we will do is to set the authentication context. We do that by first getting the command to execute, and then executing that command. That is why this process has two steps.

$loginCommand = Get-ECRLoginCommand -Region <repository region>

And then

Invoke-Expression $loginCommand.Command

This refreshed the authentication token into ECR. The remaining commands are based upon an existing ECR repository. You can access this information through the AWS Explorer by clicking on the repository name. This will bring up the details page as shown in Figure 5.

Figure 5. Viewing a running container in Docker Desktop

The value shown by the 1 is the repository name and by number 2 is the repository URI. You will need both of those values for the remaining steps. Build the image:

docker build -t <repository> .

The next step is to tag the image. In this example we are setting this version as the latest version by appending both the repository name and URI with “:latest”.

docker tag <repository>:latest <URI>:latest

The last step is to push the image to the server:

docker push <URI>:latest

You will see a lot of work going on as everything is pushed to the repository but eventually it will finish processing and you will be able to see your new image in the repository.

Note: Not all container services on AWS support Windows containers. Amazon ECS on AWS Fargate is one of the services that does as long as you make the appropriate choices as you configure your tasks. There are detailed directions to doing just that at https://aws.amazon.com/blogs/containers/running-windows-containers-with-amazon-ecs-on-aws-fargate/.

While Visual Studio offers a menu-driven approach to containerizing your application, you always have the option to containerize your application manually.

Containerizing Manually

Containerizing an application manually requires several steps. You’ll need to create your Docker file and then coordinate the build of the application so that it works with the Docker file you created. We’ll start with those steps first, and we’ll do it using JetBrains Rider. The first thing you’ll need to do is to add a Docker file to your sample application, called Dockerfile. This file needs to be in the root of your active project directory. Once you have this added to the project, right-click the file to open the Properties window and change the Build action to None and the Copy to output directory to Do not copy as shown in Figure 6.

Figure 6. Build properties for the new Docker file

This is important because it makes sure that the Docker file itself will not end up deployed into the container.

Now that we have the file, let’s start adding the instructions:

FROM mcr.microsoft.com/dotnet/framework/aspnet:4.8-windowsservercore-ltsc2019
ARG source
WORKDIR /inetpub/wwwroot

These commands are defining the source image with FROM, defining an argument, and then defining the directory and entry point where the code is going to be running on the container. The source image that we have defined includes support for ASP.NET and .NET version 4.8, mcr.microsoft.com/dotnet/framework/aspnet:4.8, and is being deployed onto Windows Server 2019, windowsservercore-ltsc2019. There is an image for Windows Server 2022, windowsservercore-ltsc2022, but this may not be usable for you if you are not running the most current version of Windows on your machine

The last part that we need to do is to configure the Docker file to include the compiled application. However, before we can do that, we need to build the application in such a way that we can access these deployed bits. This is done by publishing the application. In Rider, you publish the application by right-clicking on the project and selecting the Publish option. This will give you the option to publish to either a Local folder or Server. This brings up the configuration screen where you can select the directory in which to publish as shown in Figure 7.

Figure 7. Selecting a publish directory

It will be easiest if you select a directory underneath the project directory; we recommend within the bin directory so that the IDEs will tend to ignore it. Clicking the Run button will publish the app to the directory. The last step is to add one more command to the Dockerfile where you point the source command to the directory in which you published the application.

COPY ${source:-bin/release} .

Once you add this last line into the Dockerfile, you are ready to deploy the Windows container to ECR using the steps that we went through in the last section.

Now that we have walked through two different approaches for containerizing your older .NET Framework-based Windows application, the next step is to do the same with a .NET Core-based application. As you will see, this process is a lot easier because we will build the application onto a Linux-based container so you will see a lot of additional support in the IDEs. Let’s look at that next.

Amazon RDS Oracle for .NET Developers

The last database available in RDS that we will go over is the oldest commercial SQL-based database management system, Oracle. While originally strictly relational, Oracle is now considered a multi-model database management system, which means that it can support multiple data models, such as document, graph, relational, and key-value rather than simple supporting relational data like many of the systems we have been talking about up until now. It is also the database of choice for many different packaged software systems and is generally believed to have the largest RDBMS market share (based on revenue) – which means that it would not be surprising to be a .NET developer and yet be working with Oracle. And Amazon RDS makes it easy to do that in the cloud.

Oracle and .NET

Let’s first talk about using Oracle as a .NET developer. Since Oracle is a commercial database system, which is different from the rest of the systems we have talked about in this series, it has a lot of additional tools that are designed to help .NET developers interact with Oracle products. The first of these is the Oracle Developer Tools for Visual Studio.

Oracle Developer Tools for Visual Studio

There are a lot of .NET applications based upon Oracle, which means that it is to Oracle’s advantage to make that interaction as easy as possible. One of the ways that they did this was to create the Oracle Developer Tools for Visual Studio (ODT for VS). This tool runs within Visual Studio 2017 or 2019 (2022 was not supported at the time of this writing) and brings in features designed to provide insight and improve the developer experience. Examples of the features within this tool include:

·         Database browsing – Use Server Explorer to browse your Oracle database schemas and to launch the appropriate designers and wizards to create and alter schema objects.

·         Schema comparison – View differences between two different schemas and generate a script that can modify the target schema to match the source schema. You can do this by connecting to live databases or by using scripts within an Oracle Database project.

·         Entity Framework support – Use Visual Studio’s Entity Designer for Database First and Model First object-relational mapping. (“Code First” is also supported).

·         Automatic code generation– You can use various windows, designers, and wizards to drag and drop and automatically generate .NET code.

·         PL/SQL Editor and debugger– Allows you to take advantage of Visual Studio’s debugging features from within PL/SQL code, including seamlessly stepping from .NET code into your PL/SQL code and back out again.

You need to have a free Oracle account before you can download the tools from https://www.oracle.com/database/technologies/net-downloads.html. Please note that installing these tools will also install functionality to interact with Oracle Cloud, but those details are for a different article! Once the tools are downloaded and installed you will see a new section in your Tools menu as shown in Figure 1.

Figure 1. New features added to Tools menu by ODT for VS

You will also find four new project templates added to the Create a new project wizard:

·         Visual C# Oracle CLR project – creates a C#-based project for creating classes to use in Oracle database

·         Visual Basic Oracle CLR project – creates a Visual Basic project for creating classes to use in Oracle database

·         Oracle Database project – creates a project for maintaining a set of scripts that can be generated using Server Explorer menus. This project type does NOT support schema comparison.

·         Oracle Database project Version 2 – creates a project for maintaining a standardized set of SQL scripts that represent your Oracle database schema. This project type supports schema comparison.

There are additional features to these tools, so suffice to say that Oracle provides various ways to help .NET developers interact with their Oracle databases. Lots of ways. Many more than you will find for any of the other databases we have looked at in this series. And it should not surprise you to find that they also support connecting to Oracle databases from within your .NET application.

Oracle Data Provider for .NET (ODP.NET)

Where the ODT for VS is designed to help improve a developer’s productivity when interacting with Oracle databases, ODP.NET instead manages the interconnectivity between .NET applications and Oracle databases. ODP.NET does that by providing several NuGet packages, Oracle.ManagedDataAccess.Core and Oracle.EntityFrameworkCore, that support .NET 5 and more recent versions and several NuGet packages supporting .NET versions prior to 5.0, Oracle.ManagedDataAcess and Oracle.ManagedDataAccess.EntityFramework. Once you have the packages, the next thing that you need to do is to configure your application to use Oracle. You do this by using the UseOracle method when overriding the OnConfiguring method in the context class as shown below:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseOracle("connection string here");
}

A connection string for Oracle has three required fields:

·         User Id – username for use with the connection

·         Password – password

·         Data Source – the Transparent Network Substrate (tns) name is the name of the entry in tnsnames.ora file for the database server. This file can be found in the $ORACLE_HOME/network/admin directory.

This makes it seem like this should be an easy task to manage a connection string. However, of course, there is a caveat – you must be willing to deploy a file that has to be in a very specific place on the server and contain a reference to the server to which you need to connect. If you are okay with that approach then this is a simple connection string – “user id=prodotnetonaws;password=password123;data source=OrcleDB”. However, since a lot of the flexibility inherent in the cloud will go away if you start making this a requirement (you are no longer simply deploying just your application), then you will have to build a much uglier connection string using a Connect Descriptor:

“user id=prodotnetonaws;password=password123;data source=”(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=servernamehere)(PORT=1521))(CONNECT_DATA=(SID=databasename)))”

This means that we will need to build our connection string with additional values:

  • Host – The address of the server to which the application will connect
  • SID – The database, on the host server, to which the application is connecting

Let’s now setup our Oracle database and see where you get those values from.

Setting up an Oracle Database on Amazon RDS

Now that we know how to setup our .NET application to access an Oracle database, let’s go look at setting up an Oracle instance. First, log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then Oracle. This will bring up the remainder of that section as shown in Figure 2

Figure 2. Options after selecting Oracle when creating a new RDS Database

As you can see, your next option is to select the Database management type, for which there are two options, the default Amazon RDS and Amazon RDS Custom. The Amazon RDS Custom management type requires you to upload your own installation files and patches to Amazon S3. Selecting that management type will change the UI as shown in Figure 3.

Figure 3. Selecting Amazon RDS Customs management type

In Amazon RDS Custom, a custom engine version (CEV) is a binary volume snapshot of a database engine and specific AMI. You first upload installation files and patches to Amazon S3 from which you create CEVs. These CEVs are used as the resources for your database. While this gives you much more control over the resources used by your database as well as managing the extra options you may have purchased as add-ons, it is out of scope for this article, so select Amazon RDS instead!

The next configuration option is a checkbox to Use multitenant architecture. This is a very interesting Oracle feature that allows for the concept of a container database (CDB) that contains one or more pluggable databases (PDB). A PDB is a set of schemas, objects, and related structures that appear logically to a client application as a separate, fully functional database. RDS for Oracle currently supports only 1 PDB for each CDB.

The next configuration option is the database Edition, with Oracle Enterprise Edition and Oracle Standard Edition Two as the only available choices currently. When selecting the Enterprise edition, you will see that you must bring your own license, however, selecting the Standard edition will allow you to bring your own license or to choose a license-included version. Standard edition is significantly less expensive, so you should consider that approach unless you need the full enterprise functionality. We chose the standard edition, license-included, most-recent version.

Once you have gone through those, all the remaining sections are ones that you have seen before as they are the same as are available on MySQL, MariaDB, and PostgreSQL (there is no serverless instance approach like was available with Amazon Aurora). However, this will not enable us to be able to automatically connect with our .NET application.

If we look back at our Oracle connection string:

“user id=prodotnetonaws;password=password123;data source=”(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=servernamehere)(PORT=1521))(CONNECT_DATA=(SID=databasename)))”

There are two values that are needed, the servername and the databasename. We know that once the server has been created that there will be a servername, or host, but there is not yet a database with which to connect. Remember, this work you are doing right now is not to create the Oracle database, it is instead around getting the Oracle server set up and available. You can create an initial database by expanding the Additional Configuration section and filling out the Initial database name field in the Database options section as shown in Figure 4.

Figure 4. Creating an initial database during setup

Add in an initial database name and complete the set-up. Once you click the Create button then the process will start. However, since Oracle is a much more complicated server than any of the others, this initial creation and setup process will be considerably longer than it was with the other databases.

Once your database is available, clicking on the DB identifier will bring up the database details. This is where you will be able to see the endpoint of the server. Using that value plus the database name that you created during the setup process will finish the process for updating your application to use Oracle as its primary database.

Amazon RDS – Aurora for .NET Developers 

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database designed for the cloud. AWS claims that with some workloads Aurora can deliver up to 5x the throughput of MySQL and up to 3x the throughput of PostgreSQL without requiring any application changes. Aurora can do this because its storage subsystem was specifically designed to run on AWS’ fast distributed storage; in other words, Aurora was designed with cloud resources in mind, while those other “non-cloud only” databases are simply running on cloud resources. This design approach allows for automatic storage growth as needed, up to a cluster volume maximum size of 128 tebibytes (TiB) and offers 99.99% availability by replicating six copies of your data across three Availability Zones and backing up your data continuously to Amazon S3. It transparently recovers from physical storage failures; instance failover typically takes less than 30 seconds.

Note: A tebibyte (TiB) is a unit of measure used to describe computing capacity. The prefix tebi comes from the power-of-2 (binary) system for measuring data capacity. That system is based on powers of two. A terabyte (the unit normally seen on disk drives and RAM) is a power-of-10 multiplier, a “simpler” way of looking at the value. Thus, one terabyte = 1012 bytes, or 1,000,000,000,000 bytes as opposed to one tebibyte, which equals 240 bytes, or 1,099,511,627,776 bytes

Also, because of this customized design, Aurora can automate and standardize database replication and clustering. The last uniquely Aurora feature is the ability to use push-button migration tools to convert any already-existing RDS for MySQL and RDS for PostgreSQL applications to use RDS for Aurora instead. The argument for this ease in migration, and for Amazon Aurora in general, is that even though Aurora may be 20% more expensive than MySQL, Amazon claims that Aurora is 5x faster than MySQL, has 3x the throughput of standard PostgreSQL, and is able to scale to much larger datasets.

Creating an Amazon Aurora database in RDS

Let’s next look at creating a new Aurora database. First, log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then Aurora.  This should bring up some Aurora-specific sections as shown in Figure 1.

Figure 1. Selecting edition and capacity type when building an Aurora database

The first selection, Edition, asks you to determine whether you wish a MySQL or PostgreSQL compatible edition.

MySQL compatible edition

The default selection when creating an Aurora database is MySQL, as shown above in Figure 1. By making this choice, values will be optimized for MySQL and default filters will be so set for the options within the Available versions dropdown. The next area, Capacity type, provides two choices: Provisioned and Serverless. Selecting a provisioned capacity type will require you to select the number and instance classes that you will need to manage your workload as well as determine your preferred Availability & durability settings as shown in Figure 2.

Figure 2. Settings for creating a provisioned database

Selecting the serverless capacity type, on the other hand, simply requires you to select a minimum and maximum value for capacity units as shown in Figure 3. A capacity unit is comparable to a specific compute and memory configuration. Based on the minimum capacity unit setting, Aurora creates scaling rules for thresholds for CPU utilization, connections, and available memory. Aurora then reduces the resources for the DB cluster when its workload is below these thresholds, all the way down to the minimum capacity unit.

Figure 3. Capacity settings when creating a serverless database

You also have the ability to configure additional aspects around scaling using the Additional scaling configuration options. The first value is Autoscaling timeout and action. Aurora looks for a scaling point before changing capacity during the autoscaling process. A scaling point is a point in time when no transactions or long-running queries are in process. By default, if Aurora can’t find a scaling point within the specified timeout period, it will stop looking and keep the current capacity. You will need to choose the Force the capacity change option to make the change even without a scaling point. Choosing this option can affect any in-process transactions and queries. The last selection is whether you want the database to Scale the capacity to 0 ACUs when cluster is idle. The name of the option pretty much tells the story; when that item is selected then your database will basically shut off when not being used. It will then scale back up as requests are generated. There will be a performance impact on that first call, however, you will also not be charged any processing fees.

The rest of the configuration sections on this page are the same as they have been for the previous RDS database engines that we posted about earlier.

PostgreSQL compatible edition

Selecting to create a PostgreSQL-compatible Aurora database will give you very similar options as you would get when selecting MySQL. You have the option to select either a Provisioned or Serverless capacity type, however, when selecting the serverless capacity type you will see that the default values are higher. While the 1 ACU setting is not available, the ability to scale to 0 capacity units when the cluster is idle is still supported.

There is one additional option that is available when creating a provisioned system, Babelfish settings. Aurora’s approach towards building compatibility with the largest OSS relational database systems has proven to be successful for those using those systems. AWS took the first step into building compatibility with commercial software by releasing Babelfish for Aurora PostgreSQL. As briefly touched on earlier, Babelfish for Aurora PostgreSQL is a new capability that enables Aurora to understand commands from applications written for Microsoft SQL Server as shown in Figure 4. 

Figure 4. Accessing Amazon Aurora through Babelfish

With Babelfish, Aurora PostgreSQL now “understands” T-SQL and supports the SQL Server communications protocol, so your .NET apps that were originally written for SQL Server will work with Aurora – hopefully with minimal code changes. Babelfish is a built-in capability of Amazon Aurora and has no additional cost, although it does require that you be using a version greater than PostgreSQL 13.4, which at the time of this writing was not available on Serverless and is why this option is unable to be selected from that mode.

Amazon Aurora and .NET

As briefly touched on earlier, the primary outcome of your making a choice between PostgreSQL and MySQL is that the choice determines how you will interact with the database. This means that using the MySQL-compatible version of Aurora requires the use of the MySql.EntityFrameworkCore NuGet packages, while connecting to the PostgreSQL-compatible edition requires the Npgsql and Npgsql.EntityFrameworkCore.PostgreSQL packages, just like they were used earlier in those sections of this series. If you are considering using Babelfish with the PostgreSQL-compatible, then you would use the standard SQL Server NuGet packages as we worked with in the last few posts.

This means that moving from MySQL on-premises to MySQL-compatible Aurora Serverless would require no code changes to systems accessing the database; the only change you would have to manage would be the connection string so that you can ensure that you are talking to the database. Same for PostgreSQL and even SQL Server. This approach for compatibility has made it much easier to move from well-known database systems to Amazon’s cloud-native database, Aurora.

Amazon RDS – PostgreSQL for .NET Developers

PostgreSQL is a free, open-source database that emphasizes extensibility and SQL compliance and was first released in 1996. A true competitor to commercial databases such as SQL Server and Oracle, PostgreSQL supports both online transaction processing (OLTP) and online analytical processing (OLAP) and has one of the most advanced performance features available, multi-version concurrency control (MVCC). MVCC supports the simultaneous processing of multiple transactions with almost no deadlock, so transaction-heavy applications and systems will most likely benefit from using PostgreSQL over SQL Server, and there are companies that use PostgreSQL to manage petabytes of data.

Another feature that makes PostgreSQL attractive is that not only does it support your traditional relational database approach, but it also fully supports a JSON/JSONB key/value storage approach that makes it a valid alternative to your more traditional NoSQL databases; so, you can now use a single product to support the two most common data access approaches. Because of its enterprise-level of features and the amount of work it takes to manage and maintain those, even though it is also open source and free software like MySQL and MariaDB, it is slightly more expensive to run PostgreSQL on Amazon RDS than those other open-source products.

PostgreSQL and .NET

As with any database products that you will access from your .NET application, its level of support for .NET is important. Fortunately for us, there is a large community involved in helping ensure that PostgreSQL is relevant to .NET users.

Let’s look at what you need to do to get .NET and PostgreSQL working together. The first thing you need to do is to include the necessary NuGet packages, Npgsql and Npgsql.EntityFrameworkCore.PostgreSQL as shown in Figure 1.

Figure 1. NuGet packages required to connect to PostgreSQL

Once you have the packages, the next thing that you need to do is to configure your application to use PostgreSQL. You do this by using the UseNpgsql method when overriding the OnConfiguring method in the context class as shown below:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseNpgsql("connection string here");
}

A connection string for PostgreSQL has six required fields:

  • server – server with which to connect
  • port – port number on which PostgreSQL is listening
  • user id – user name
  • password – password
  • database – database with which to connect
  • pooling – whether to use connection pooling (true or false)

When working in an ASP.NET Core application the connection string is added to the appsettings.json file as shown in Figure 2.

Figure 2. Adding a connection string to an ASP.NET Core application

Let’s now go create a PostgreSQL database.

Setting up a PostgreSQL Database on Amazon RDS

Now that we know how to set up our .NET application to access PostgreSQL, let’s go look at setting up a PostgreSQL instance. First, log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then PostgreSQL. You then have a lot of different versions that you can select from, however, the NuGet packages that we used in our earlier example require a reasonably modern version of PostgreSQL, so unless you have any specific reason to use an older version you should always use the default, most updated version.

Once you have defined the version of PostgreSQL that you will use, your next option is to select the Template that you would like to use. Note that you only have two different templates to choose from:

·         Production – defaults are set to support high availability and fast, consistent performance.

·         Dev/Test – defaults are set in the middle of the range.

Note: Both MySQL and MariaDB had a third template, Free tier, that is not available when creating a PostgreSQL database. That does not mean that you must automatically pay, however, as the AWS Free Tier for Amazon RDS offer provides free use of Single-AZ Micro DB instances running PostgreSQL. It is important to consider that the free usage tier is capped at 750 instance hours per month across all your RDS databases.

Selecting the template sets defaults across the rest of the setup screen and we will call those values out as we go through those items.

Once you select a template, your next setup area is Availability and durability. There are three options to choose from:

·         Multi-AZ DB cluster – As of the time of writing, this option is in preview. Selecting this option creates a DB cluster with a primary DB instance and two readable standby instances, with each instance in a different Availability Zone (AZ). Provides high availability, data redundancy and increases capacity to serve read workloads.

·         Multi-AZ DB instance – This option creates a primary DB instance and a standby DB instance in a different AZ. Provides high availability and data redundancy, but the standby instance doesn’t support connections for read workloads. This is the default value if you chose the Production template.

·         Single DB instance– This option creates a single DB instance with no standby instances. This is the default value if you chose the Dev/Test template.

The next section, Settings, is where you provide the DB instance identifier, or database name, and your Master username and Master password. Your database identifier value must be unique across all the database instances you have in the current region, regardless of engine option. You also have the option of having AWS auto-generate a password for you.

The next section allows you to select the DB instance class. You have the same three filters that you had before of Standard classes, Memory optimized classes, and Burstable classes. Selecting one of the filters changes the values in the instance drop-down box, You need to select Burstable classes and then one of the instances with micro in the title, such as a db.t3.micro as shown in Figure 3.

Figure 3. Selecting a free-tier compatible DB instance

The next section in the setup is the Storage section, with the same options that you had available when going through the MySQL and MariaDB setups, though the default values may be different based upon the instance class that you selected. After the storage section are the Connectivity and Database authentication sections that we walked through earlier, so we will not go through them again now – they are standard across all RDS engine options. Selecting the Create database button will take you back to the RDS Databases screen where you will get a notification that the database is being created as well as a button that you can click to access the connection details. Make sure you get the password if you selected for AWS to create your administrative password. You will only be able to access the password this one time.

The pricing for PostgreSQL is slightly higher than MariaDB or MySQL when looking at compatible configurations, about 6% higher.

Selecting between PostgreSQL and MySQL/MariaDB

There are some significant differences between PostgreSQL and MySQL\MariaDB that can become meaningful when building your .NET application. Some of the more important differences are listed below. There are quite a few management and configuration differences, but those are not mentioned since RDS manages all of those for you!

·         Multi-Version Concurrency Control – PostgreSQL was the first DBMS to rollout multi-version concurrency control (MVCC), which means reading data never blocks writing data, and vice versa. If your database is heavily used for both reading and writing than this may be a significant influencer.

·         More types supported – PostgreSQL natively supports NoSQL as well as a rich set of data types including Numeric Types, Boolean, Network Address, Bit String Types, and Arrays. It also supports JSON, hstore (a list of comma-separated key/value pairs), and XML, and users can even add new types.

·         Sequence support – PostgreSQL supports multiple tables taking their ids from the same sequence while MySQL/MariaDB do not.

·         Index flexibility – PostgreSQL can use functions and conditional indexes, which makes PostgreSQL database tuning very flexible, such as not having a problem if primary key values aren’t inserted sequentially.

·         Spatial capability – PostgreSQL has much richer support for spatial data management, quantity measurement, and geometric topology analysis.

While PostgreSQL is considered one of the most advanced databases around, that doesn’t mean that it should automatically be your choice. Many of the advantages listed above can be considered advanced functionality that you may not need. If you simply need a place to store rarely changing data, then MySQL\MariaDB may still be a better choice. Why? Because it is less expensive and performs better than PostgreSQL when performing simple reads with simple join. As always, keep your use cases in mind when selecting your database.

Note: AWS contributes to an open-source project called Babelfish for PostgreSQL, which is designed to provide the capability for PostgreSQL to understand queries from applications written for Microsoft SQL Server. Babelfish understands the SQL Server wire-protocol and T-SQL. This understanding means that you can use SQL Server drivers for .NET to talk to PostgreSQL databases. As of this writing, this functionality is not yet available in the PostgreSQL version of RDS. It is, however, available for Aurora PostgreSQL. We will go over this in more detail later in the chapter. The project can be seen at https://www.babelfishpg.org.

MariaDB, MySQL, and PostgreSQL are all open-source databases that have existed for years and that you can use anywhere, including that old server under your desk. The next database we will talk about is only available in the cloud and within RDS, Amazon Aurora.

Amazon RDS – MariaDB for .NET Developers

MariaDB is a community-developed, commercially supported fork of MySQL that is intended to remain free and open-source software under the GNU General Public License (the same license that MySQL started under). As just mentioned, it was forked because of MySQL’s acquisition by Oracle where many of the initial MySQL developers were afraid that because of how MySQL competed against the Oracle database, progress would be slowed or stopped on MySQL. MariaDB’s API and protocol are compatible with those used by MySQL, plus some features to support native non-blocking operations and progress reporting. This means that all connectors, libraries, and applications which work with MySQL should also work on MariaDB. However, for recent MySQL features, MariaDB either has no equivalent yet, such as geography, or deliberately chose not to be 100% compatible. This list of incompatibilities will likely continue to grow with each version.

MariaDB and .NET

Using .NET with MariaDB is easy to configure because of how similar the APIs are for MariaDB and for MySQL. To be honest, they are so identical that the easiest way to consume MariaDB in a .NET application is to use the same MySQL NuGet package and connection approach as we went over in the last post. The MariaDB team does not really spend any time building connectors, and instead works to ensure that the connectors that are out there, such as those built by the MySQL team, are compatible.

Setting up a MariaDB Database on Amazon RDS

Now that we know how to setup our .NET application to access MariaDB, let’s go look at setting up MariaDB in Amazon RDS. Log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then MariaDB. You will have a list of versions, starting with version 10.2 at the time of this writing up through the most recent release.

The rest of the set-up screen, surprisingly enough, will look eerily familiar if you just went through the MySQL setup steps; mainly because they are identical! You will have the same three options for the Template that you would like to use (Production, Dev/Test, and Free tier) as well as all of the configuration sections that follow.

Since we took the Free tier route with MySQL, let’s mix it up a little bit and go with the Dev/Test version for MariaDB and we can talk about some of the areas that we glossed over when creating the MySQL database.

The first of these is after you create the database instance identifier and have provided the master user information and is entitled DB instance class. There are three options available for instances:

·         Standard classes (includes m classes) – provide a balance of compute, memory, and network resources and is the best all-around choice for many different database workloads.

·         Memory-optimized classes (includes r and x classes) – have large memory allocations to support those database workloads that process large data sets in memory.

·         Burstable classes (includes t classes) – are the only option available for the free tier and are designed to provide a baseline CPU performance with the ability to burst above this baseline as needed.

Selecting one of these options changes the instances that are available in the instance drop-down from which you make your selection. Selecting the standard classes as shown in Figure 1 will present a drop-down of the m-class instances.

Figure 1. DB Instance class selection for MariaDB (and MySQL)

Selecting one of the other options will filter the list in the drop-down to the applicable classes.

Caution: The lowest m instance class, db.m5.large, with 2vCPUs, 8 GB RAM, and 4,750 Mbps network connectivity will run you $124.83 a month in US East 2, so even a momentary creation has the chance to cost you! The t instance classes are the ones that include the free tier versions.

The next section in the setup is the storage section, with the same options that you had when going through the MySQL steps, though the default values may be different based upon the instance class that you selected. After the storage section is the second “greyed out” area that we saw when we walked through setting up MySQL, Availability & durability.

One of the best features of RDS is how it makes the installation and configuration of a new RDBMS painless when you think about what you would have to do to manage the configuration and maintenance of a standby instance on your own. For those instances where your data needs to be as available as possible, the ability to create (and forget about) a standby instance by checking a radio button can’t be overlooked. Creating a replica will configure a synchronous standby replica in a different Availability Zone than the primary DB instance. In the case of a planned or unplanned outage of the main instance, RDS will automatically failover to the standby. When using a multi-AZ deployment, however, you will be paying approximately twice as much for the duplicated instances as shown in Figure 2.

Figure 2. Estimated monthly costs with standby enabled

Once you have selected the appropriate availability option, in this case we chose to enable a standby instance, the rest of your experience will be the same as it was for MySQL, setting up Database authentication and Additional configuration. You can keep the defaults in these sections and go ahead and create your database or change the values as desired to get a better understanding of each area.

With identical pricing between MySQL and MariaDB, and similar APIs and other interactions, you may be wondering what the differences are between the two.

Selecting between MySQL and MariaDB

My recommendation when you are trying to select between MySQL and MariaDB? All other things being equal, go with MariaDB. Why? Primarily because of the advanced capability that MariaDB offers such as its optimization for performance and its ability to work with large data sets. MariaDB has also spent a lot of effort adding query optimizations for queries that use joins, sub-queries, or derived tables; so, its overall performance is better than you will find with MySQL. Lastly, MariaDB provides better monitoring through the introduction of micro-second precision and extended user statistics.

However, there are occasions when MySQL makes more sense than does MariaDB, generally, when you are using some of the features available in MySQL that are not available in MariaDB, such as geographical processing, JSON stored as binary objects rather than text, or MySQL authentication features such as the ability to authenticate to the database via roles or the ability for a user to activate multiple roles at the same time.

The key is that both are available, and both provide support for .NET development in AWS. However, you do not have to limit your choices to just MariaDB or MySQL, as there is another open-source database that is supported in Amazon RDS that is worth a review. And that’ll be the next post!

Amazon RDS – MySQL for .NET Developers

Originally released in 1995, MySQL has gone through a series of “owners” since then, with it currently being primarily developed by Oracle. MySQL is free and open-sourced under the terms of the GNU General Public License (GPL). That AWS does not have to pay any licensing fee is one of the primary reasons that the cost for MySQL on Amazon RDS is the lowest of all their offerings; you are only paying for hardware and management rather than hardware, licensing, and management.

MySQL may not be as fully featured or high-powered as some of the commercial systems such as SQL Server and Oracle, however that does not mean that it is not of use to a .NET developer. One of the reasons that MySQL became popular was because of its relative simplicity and ease of use, so if all you are looking for is the ability to easily persist data in a relational database then more than likely MySQL will support your need at a fraction of the cost of SQL Server.

MySQL and .NET

Before we dig into the RDS support for MySQL, let us first briefly go over using MySQL during .NET development. The primary use case when thinking about .NET and database products is the support for Entity Framework, .NET’s default object-relational mapping (ORM) system. If there is support for that database, then using that database in your .NET application will come down to the features of the database rather than its interaction with .NET. With that in mind, let’s look at what you need to do to use MySQL and Entity Framework in your application.

The first thing you need to do is to include the necessary NuGet package, MySql.EntityFrameworkCore. Once you have the package, next is configuring your application to use MySQL. You do this by calling the UseMySQL method when overriding the OnConfiguring method in the context class as shown below:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseMySQL("connection string here");
}

A connection string for MySQL has four required fields:

  • server – server with which to connect
  • uid – user name
  • pwd – password
  • database – database with which to connect

From here on out, it is just like working with Entity Framework and SQL Server. Kinda anti-climactic, isn’t it? Let’s now go create a MySQL database.

Setting up a MySQL Database on Amazon RDS

Now that we know how to setup our .NET application to access MySQL, let’s go look at setting up MySQL. Log into the console, go to RDS, select Create database. On the Create database screen, select Standard create and then MySQL. Doing this will show you that there is only one Edition that you can select, MySQL Community. You then have a lot of different release versions that you can select from, however the NuGet packages that we used in our earlier example require a reasonably modern version of MySQL, so unless you have any specific reason to use an older version you should always use the default, most updated version.

Once you have defined the version of MySQL that you will use, your next option is to select the Template that you would like to use. You have three different templates to choose from:

Production – defaults are set to support high availability and fast, consistent performance.

Dev/Test – defaults are set in the middle of the range.

Free tier – defaults are set to the minimal, free version.

We are going to select the Free tier version to limit our costs for this walkthrough! This will create many of the default system values that you will see during the server configuration, more on these later.

The next section is Settings. Here you will create the DB instance identifier and the Master username and Master password, the login credentials as shown in Figure 1. Note that the DB instance identifier needs to be unique across all the account’s DB instances in this region, not just MySQL database instances. We used “prodotnetonaws” as both the instance identifier and the Master username. If you choose to Auto generate a password, you will get an opportunity to access that password immediately after the database is created.

Figure 1. Naming the DM instance and creating the master user

Scrolling down to the next section, DB instance class, will show that the instance class being used is the db.t2.micro (or comparable, depending upon when you are reading this) which is the free tier-compatible instance type. The next section down the page, Storage, is also filled out with the free version of storage, defaulting to 20 GiB of Allocated storage. Do not change either of these values to stay within the “free” level.

There are four additional sections. The first of these is Availability & durability, where you can create a replica in a different availability zone. Amazon RDS will automatically fail over to the standby in the case of a planned or unplanned outage of the primary. If you selected the “Free” template, then this whole section will be greyed out and you will not be able to create a Multi-AZ deployment. The second section is Connectivity. This is where you assign your Virtual private cloud (VPC) and Subnet group, determine whether your RDS instance has Public access, and assign a VPC security group. You can also select an Availability zone if desired. We left all these values at their default.

The third section is Database authentication. You have three options in this section, with the first being Password authentication (the default value) where you manage your database user credentials through MySQL’s native password authentication features. The second option in this section is Password and IAM database authentication where you use both MySQL’s password authentication features and IAM users and roles, and the last is Password and Kerberos authentication, where you use both MySQL’s password authentication features and an AWS Managed Microsoft Active Directory (AD) created with AWS Directory Service.

The last section when creating an RDS database is Additional configuration. This allows you to add any Database options, configure Backup, add Monitoring, and configure Logging, Maintenance, and to turn on Deletion protection. When deletion protection is enabled, you are not able to delete the database without first editing the database to turn off that setting. Select the Create database button when completed. This will bring you back to the Databases screen where you will likely see a notice that the database is being created as shown in Figure 2.

Figure 2. Notice that an RDS database is being created

If you had selected to auto generate a password, the button shown in Figure 2 also shows the View credential details button that you would need to click to see the generated value. Once the database is Available, you can interact with it like any other database, using endpoint and port values that are shown in the Connectivity & security tab in the database details, as shown in Figure 3, for the connection string in your .NET application.

Figure 3. MySQL database details screen showing endpoint and port

 MySQL is the most used open-source relational database in the world. However, as mentioned earlier, Oracle took control of the project in 2010 and this created some angst amongst MySQL users. This led to a forking of the source code by some of the original MySQL developers and a creation of a new open-source relational database based upon the MySQL code, MariaDB. We will look at that next.

.NET and Amazon DynamoDB

Where DocumentDB is a document store that is built around the MongoDB API, DynamoDB is a fully managed, serverless, NoSQL, key-value database. While originally built as a key-value database, DynamoDB also supports document data models, which allows you to get the speed of key-value databases along with the ability to do searches within the document that was stored as a value. It uses built-in horizontal scaling to scale to more than 10 trillion requests per day with peaks greater than 20 million requests per second over petabytes of storage. In other words, DynamoDB scales really well!

There are two different capacity modes in which you can configure each DynamoDB database, On-demand and Provisioned. If these terms sound familiar, that should not be a surprise as these are the same options you get with Amazon Aurora, AWS’s serverless relational database offering. With on-demand capacity, you pay per request for data reads and writes into your database. You do not have to specify how many reads or writes there will be as DynamoDB scales itself to support your needs as that number increases or decreases. With provisioned capacity, you specify the number of reads and writes per second. You can use auto-scaling to adjust capacity based on the utilization for when the usage of your system may grow. Provisioned capacity is less expensive, but you may end up over-provisioning your server and paying more than you need.

Setting up a DynamoDB Database

The easiest way to get a feel for DynamoDB is to use it, so let’s set one up. Log in to the console, and either search for DynamoDB or find it using Services > Database > DynamoDB. Select the Create table button on the dashboard to bring up the Create table page. The first setup section is Table details and is shown in Figure 1.

Figure 1. Creating a DynamoDB table

There are three different values that you can enter. The first, Table name, is straightforward and needs to be unique by region. We used “Person” to identify the type of data that we are going to be storing in the table. The second value is the Partition key and the third value is an optional Sort key. A simple primary key is made up only of the partition key, and no two items in the table can share the same partition key. A composite primary key, on the other hand, is made up of both a partition key and a sort key. All items with the same partition key are stored together, sorted by the sort value. If using a composite primary key, you can have multiple instances of a partition key, however, the combination of partition key and sort key must be unique.

Note – One instance where we have seen a composite key used to great effect is when different versions are being kept. As a new version is created, it gets the same partition key but a new sort key, in that case, a version number.

The keys can have different types, Binary, Number, and String. In our case, as you can see in Figure 1, we created a string partition key of “Id” without a sort key.

The next configuration section on the page is Settings, where you can select either the Default settings or Customize settings. Going with the default settings takes all the fun out of the rest of the configuration section of this article, so select to customize settings.

The next section is Table class, where you have two options, DynamoDB Standard and DynamoDB Standard-1A. The determination between these two is based upon the frequency with which data will be accessed. The less frequently your table will have reads and writes performed against it, the more likely that the 1a version will be appropriate.

Next comes the Capacity calculator section. This is an interesting tool that helps you translate your typical (or predicted) usage into the generic Read and Write units that are used for configuration, pricing, and billing. You will need to expand the section before it becomes fully available, but when you do you will get a series of fields as shown in Figure 2.

Figure 2. DynamoDB Capacity calculator

The questions that it asks are straightforward, how big your average payload is, how many reads per second will you have, how many saves per second, and what your business requirements are for those reads and writes. Let’s look at those options in a bit more detail. Average item size (kb) is an integer field capturing your average payload rounded to the nearest kilobyte. This can be frustrating, because many times your payloads may be considerably smaller than a kilobyte – but go ahead and choose 1 if your payloads are relatively small. The Item read/second and Item write/second are also straightforward integer fields and we used 25 items read per second and 4 items written per second.

The Read consistency and Write consistency fields are a little different as they are drop-downs. Read offers Eventually consistent, where it is possible that a read may not have the most recent version of the data (because it is coming from a read-only copy of the data), Strongly consistent where all reads will have the most recent version of the data (because it is coming from the primary table) and Transactional where multiple actions are submitted as a single all-or-nothing operation. Write consistency offers two approaches, Standard where the data is inserted into the primary, and Transactional which is the same as for Read consistency. In our example, we selected Strongly consistent for reads and Standard for writes. The calculator then estimated our costs at $4.36 a month in US-West-2. As you can see, it’s a pretty inexpensive option.

The next section in the table creation screen is the Read/write capacity settings. This is where the two modes we touched on earlier come in, On-demand and Provisioned. Since we went through the calculator and estimated a whopping $4.36 a month charge, we will go ahead and use the simpler On-demand option.

The next section is Secondary indexes. This is where DynamoDB varies from a lot of other Key/Value stores because it allows you to define indexes into the content – which tends to be more of a document database approach. There are two types of secondary indexes, Global and Local.  A global secondary index provides a different partition key than the one on the base table while a local secondary index uses the same partition key and a different sort key. The local secondary index requires that the base table already be using both a partition key and a sort key, but there is no such constraint on the use of a global secondary key.

The next configuration section is Encryption at rest. There are three choices, Owned by Amazon DynamoDB where the application manages encryption using AWS keys, AWS managed key where AWS creates keys and then manages those keys within your AWS Key Management Service (KMS), and Customer managed key where you can create and manage the KMS key yourself. We selected the AWS-owned key.

The last configuration section is Tags. Once your table is configured, select the Create table button at the bottom. This will bring up the Tables listing with your table in a “Creating” status as shown in Figure 3. Once the creation is complete, the status will change to “Active”.

Figure 3. Tables listing after creating a table.

Unlike Amazon DocumentDB, DynamoDB gives you the ability to work directly with the data within the table. Once the table has been created, click on the table name to go into the table detail screen. This page gives you a lot of information, including metrics on table usage. We won’t go into that into much detail, simply because this could use its own book, so instead click on the button at the top, Explore table items. This will bring you to a page where you can interact with the items within the table. There is also a Create item button at the top. We used this button to create two simple items in the table, with the first shown in Figure 4.

Figure 4. Creating an item in Amazon DynamoDB.

If this format seems a little unusual, it is because this is the DynamoDB JSON, which is different from “traditional” JSON in that it stores the items as their own key/value pairs. If you turn off the View DynamoDB JSON selector at the top, then you will see the more standard JSON:

{
 "Id": "{29A25F7D-C2C1-4D82-9996-03C647646428}",
 "FirstName": "Bill",
 "LastName": "Penberthy"
}

DynamoDB and AWS Toolkit for Visual Studio

Unlike DocumentDB, which has no support in any of the IDE toolkits, you have the ability to access DynamoDB from the Toolkit for Visual Studio. Using the toolkit, you can both view the table and look at items within the table as shown in Figure 5.

Figure 5. Using the Toolkit for Visual Studio to view DynamoDB items

You can even use the toolkit to filter returns by selecting the Add button within the top box. This will add a line with a drop-down that includes all of the field names (Id, FirstName, and LastName) and allow you to enter a filter value. Selecting “LastName” “Equal” “Steve” and then clicking the Scan Table button will result in only one result remaining in the list as shown in Figure 6.

Figure 6. Filtering DynamoDB items in Visual Studio

The toolkit will also allow you to add, edit, and delete items simply by double-clicking on the item in the result list. Since you are working with the items in a list form, you can even use the Add Attribute button to add a new “column” where you can capture new information. Once you add a value to that new column and Commit Changes, those items (where you added the value) will be updated.

As you can imagine, the ability to interact with the data directly in Visual Studio makes working with the service much easier, as you can look directly into the data to understand what you should get when parts of your code are run in Debug or when running integration tests. Unfortunately, however, this functionality is only supported in the AWS Toolkit for Visual Studio and is not available in either Rider or Visual Studio Code toolkits.

DynamoDB and .NET

The last step is to take a look at using DynamoDB within your .NET application. As mentioned earlier, using DynamoDB means that you will not be using Entity Framework as you did with the relational databases earlier. Instead, we will be using a DynamoDB context, which provides very similar support as does the DBContext in Entity Framework.

Note: One of the interesting features of using DynamoDB within your development process is the availability of a downloadable version of DynamoDB. Yes, you read that correctly, you can download and locally install a version of the DynamoDB as either a Java application, an Apache Maven dependency, or as a Docker image. Details on this can be found at https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html

In many ways, the .NET SDK for DynamoDB is one of the more advanced SDKs as it offers support in three layers:

Low-level interface – the APIs in this interface relate very closely to the service model and there is minimal help functionality.

Document interface – This API includes constructs around the Document and Table classes so there is minimal built-in functionality to help do things like converting to business objects.

High-level interface – This is where AWS provides support around converting Documents to .NET classes and other helpful interactions.

Your code can interact with any of the interfaces based upon your business need. We will be relying on the high-level interface as we move into the code examples.

First, you need to add the appropriate NuGet package, AWSSDK.DynamoDBv2. Once you have that added, the next thing that you need to do is to configure your connection to the database. The following code snippet shows a constructor method to do this.

private AmazonDynamoDBClient client;
private DynamoDBContext context;

public DataClient()
{
    client = new AmazonDynamoDBClient();
    context = new DynamoDBContext(client);
}

There are two objects introduced in this snippet. The first class introduced is the Amazon.DynamoDBv2.AmazonDynamoDBClient. This class provides the default implementation for accessing the service. The constructor used in the example will default to credentials stored in the application’s default configuration. Running this on your local machine means that the application will use your “default” profile to connect. There are other constructors that you can use, ranging from passing in your Access Key ID and Secret Key, to using Credentials stored in AWS Key Manager. For this example, however, we will stick with the default constructor. The second object introduced is the Amazon.DynamoDBv2.DataModel.DynamoDBContext. The DataModel part of the namespace indicates that this is a high-level interface based upon the low-level interface offered by the AmazonDynamoDBClient class.

Now that we have defined the context, let’s look at how you would use it. The following is a method to save an object into the table.

public async Task SaveAsync<T>(T item)
{
    await context.SaveAsync<T>(item);
}

This is where you start to see the power offered by the high-level interface. Let’s step out of this class and look at how this is used.

public class Person
{
    public string Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

public async Task<string> Add(string firstName, string lastName)
{
    var client = new DataClient();

    Person itemToAdd = new Person { 
        Id = Guid.NewGuid().ToString("B").ToUpper(), 
        FirstName = firstName, 
        LastName = lastName 
    };

    await client.SaveAsync<Person>(itemToAdd);
    return itemToAdd.Id;
}

This Add method is taking in a first name and a last name, creates a Person object, persists the information to DynamoDB, and then returns the Id to the calling method. And that is what the high-level interface offers. You could do the same work yourself using the Document interface, but you would have to manage all of the serialization and deserialization necessary to convert from the business objects to the JSON that is stored in the table.

One other feature of the high-level interface is much less obvious. Think about when we created the DynamoDB table earlier, and the name that we used – “Person”. By default, the high-level interface expects the class name of the item being persisted to be the same as the table name, as it is in our case.

We just went over adding an item to the table through the high-level interface. Let’s now look at an example of retrieving an item.

public async Task<T> FindByIdAsync<T>(string id)
{
    var condition = new List<ScanCondition> { 
new ScanCondition("Id", ScanOperator.Equal, id) };
    AsyncSearch<T> search = context.ScanAsync<T>(condition);
    var list = await search.GetRemainingAsync();
    return list.FirstOrDefault();
}

You can see that this gets a little more complicated.  Because this code is doing a scan of the data, it is going to always return a List<T>, even though we set the Id as the primary key on the table. This happens because the high-level interface does not know anything about the definition of the table itself and thus generalizes the result set.

This scanning approach should not feel new, however. Think back to how the filtering was set in the AWS Toolkit for Visual Studio (Figure 6) and you will see that this is the same approach. This approach is used because of those enhancements into DynamoDB that make it more document database-like; it allows you to scan through the data looking for a specific condition, in this case, the Id equal to the value passed into the FindByIdAsync method. And, just as shown in the toolkit, you can use multiple conditions.

public async Task<List<T>> FindByValueAsync<T>(Dictionary<string,object> searchDict)
{
    var conditions = new List<ScanCondition>();
    foreach(string key in searchDict.Keys)
    {
        conditions.Add(
          new ScanCondition(key,
                    ScanOperator.Equal,
                    searchDict[key]));
    }
    AsyncSearch<T> search = context.ScanAsync<T>(conditions);
    return await search.GetRemainingAsync();
}

In this instance, we are simply accepting a Dictionary<string, string> where we are assuming the key value will be the field name, such as LastName, and the dictionary value will be the value to use when filtering. An empty dictionary means that no filters will be set which, as you can imagine, would be somewhat terrifying if you consider a massive table with petabytes of data. That’s where the return class from the ScanAsync method comes into play, the AsynchSearch<T> class. 

AsynchSearch is an intermediate class that provides several ways of interacting with the service. In the code example above, the method used on that object was GetRemainingAsync(). The GetRemainingAsync method is used to get all the remaining items that match the filter condition and bring them back as a single unit. However, there is another method on AsynchSearch, GetNextSetAsync, which manages a finite set of items – up to 1MB of items. You can examine a property on the AsynchSearch object, IsDone, which tells you whether the current result set is the final one and gives you the ability to manage the pagination yourself.

We have spent time going through the High-level interface provided by the SDK. We have not even touched on the powerful Document interface and how that provides more specific control over the values stored in DynamoDB. There are many examples of development teams using the Document-level interfaces and writing their own high-level interfaces where they can incorporate their specific requirements rather than using the SDK’s more generic approach. That approach is not wrong, as no one understands your needs as well as you do – but having the High-level interface allows you to easily get a large majority of requirements fulfilled and then you can customize using the Document interface as you see fit.

There is a lot more that we can go into about using DynamoDB with .NET, like an entire book, so we won’t do that here. Let it suffice to say, yes, you can use DynamoDB with .NET.

Amazon DocumentDB and .NET

Amazon DocumentDB is a fully managed, scalable, and highly available document database service that uses a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64 TB per cluster. Amazon DocumentDB reduces database I/O by only persisting write-ahead logs and does not need to write full buffer page syncs, avoiding slow, inefficient, and expensive data replication across network links.

This design allows for the separation of compute and storage. This means that you can scale each of those areas independently. DocumentDB is designed for 99.99% availability and replicates six copies of your data across three availability zones. It also continually monitors cluster instance health and automatically fails over to a read replica in the event of a failure–typically in less than 30 seconds. You can start with a single instance that delivers high durability then, as you grow, can add a second instance for high availability, and easily increase the number of instances you use for read-only. You can scale read capacity to millions of requests per second by scaling out to 15 low latency read replicas that can be scattered across three availability zones

One of the interesting characteristics is that DocumentDB was built with MongoDB compatibility. Why is that important?

MongoDB

MongoDB is a source-available cross-platform document database that was first released in 2009 and is, by far, the most popular document database available. Since it was both the first, and available open-sourced, it tended to be adopted early by companies running on-premises and experimenting with the concept of document stores so there is a huge number of on-premises software systems that rely completely on, or in part on, MongoDB. This means that any movement of systems to the cloud would have to come up with a way to support the software written around MongoDB otherwise this would be a massive impediment in migrating to the cloud.

AWS realized this and released DocumentDB to include compatibility with the MongoDB APIs that were available at the time. At the time of this writing (and also at release), DocumentDB implements the MongoDB 3.6 and 4.0 API responses. However, MongoDB released version 5.0 in July 2021, so there is no feature parity. Also, since there were annual releases for 4.2, 4.4, 4.4.5, and 4.4.6 that AWS did not support it appears that there will not be additional support moving forward.

While current compatibility with MongoDB is not supported, the design of DocumentDB, together with optimizations like advanced query processing, connection pooling, and optimized recovery and rebuild, provide a very performant system. AWS claims that DocumentDB achieves twice the throughput of currently available MongoDB managed services.

Setting up a DocumentDB Database

Now that we have a bit of understanding about DocumentDB, let’s go set one up. Log in to the console, and either search for Document DB or find it using Services > Database > Amazon DocumentDB. Select the Create Cluster button on the dashboard to bring up the Create cluster page as shown in Figure 1.

Figure 1. Create Amazon DocumentDB Cluster screen

First, fill out the Configuration section by adding in a Cluster identifier. This identifier must be unique to DocumentDB clusters within this region. Next, you’ll see a dropdown for the Engine version. This controls the MongoDB API that you will want to use for the connection. This becomes important when selecting and configuring the database drivers that you will be using in your .NET application. We recommend that you use 4.0 as this gives you the ability to support transactions. The next two fields are Instance class and Number of instances. This is where you define the compute and memory capacity of the instance as well as the number of instances that you will have. For the Instance class, the very last selection in the dropdown is db.t3.medium, which is eligible for the free tier. We selected that class. When considering the number of instances, you can have a minimum of 1 instance that acts as both read-write, or you can have more than one instance which would be primary instance plus replicas. We chose two (2) instances so that we can see both primary and replica instances.

You’ll see a profound difference if you compare this screen to the others that we saw when working with RDS as this screen is much simpler and seems to give you much less control over how you set up your cluster. The capability to have more fine control over configuration is available, however; you just need to click the slider button at the lower left labeled Show advanced settings. Doing that will bring up the Network settings, Encryption, Backup, Log exports, Maintenance, Tags, and Deletion protection configuration sections that you will be familiar with from the RDS chapters.

Once you complete the setup and click the Create cluster button you will be returned to the Clusters list screen that will look like Figure 2.

Figure 2. Clusters screen immediately after creating a cluster

As Figure 2 shows, there are initial roles for a Regional cluster and a Replica instance. As the creation process continues, the regional cluster will be created first, and then the first instance in line will change role to become a Primary instance. When creation is completed, there will be one regional cluster, one primary instance, and any number of replica instances. Since we chose two instances when creating the cluster, we show one replica instance being created. When the cluster is fully available you will also have access to the regions in which the clusters are available. At this point, you will have a cluster on which to run your database, but you will not yet have a database to which you can connect. But, as you will see below when we start using DocumentDB in our code, that’s ok.

DocumentDB and .NET

When looking at using this service within your .NET application, you need to consider that like Amazon Aurora, DocumentDB emulates the APIs of other products. This means that you will not be accessing the data through AWS drivers, but instead will be using MongoDB database drivers within your application. It is important to remember, however, that your database is not built using the most recent version of the MongoDB API, so you must ensure that the drivers that you do use are compatible with the version that you selected during the creation of the cluster, in our case MongoDB 4.0. Luckily, however, MongoDB provides a compatibility matrix at https://docs.mongodb.com/drivers/csharp/.

The necessary NuGet package is called MongoDB.Driver. Installing this package will bring several other packages with it including MongoDB.Driver.Core, MongoDB.Driver.BSON, and MongoDB.Libmongocrypt. One of the first things that you may notice is there is not an “Entity Framework” kind of package that we got used to when working with relational databases. And that makes sense because Entity Framework is basically an Object-Relational Mapper (ORM) that helps manage the relationships between different tables – which is exactly what we do not need when working with NoSQL systems. Instead, your table is simply a collection of things. However, using the MongoDB drivers allow you to still use well-defined classes when mapping results to objects. It is, however, more of a deserialization process than an ORM process.

When working with DocumentDB, you need to build your connection string using the following format: mongodb://[user]:[password]@[hostname]/[database]?[options]

This means you have five components to the connection string:

  • user – user name
  • password – password
  • hostname – url to connect to
  • database – Optional parameter – database with which to connect
  • options – a set of configuration options

It is easy to get the connection string for DocumentDB, however. In the DocumentDB console, clicking on the identifier for the regional cluster will bring you to a summary page. Lower on that page is a Connectivity & security tab that contains examples of approaches for connecting to the cluster. The last one, Connect to this cluster with an application, contains the cluster connection string as shown in Figure 3.

Figure 3. Getting DocumentDB connection string

You should note that the table name is not present in the connection string. You can either add it to the connection string or you can use a different variable as we do in the example below.

Once you have connected to the database and identified the collection that you want to access, you are able to access it using lambdas (the anonymous functions, not AWS’ serverless offering!). Let’s look at how that works. First, we define the model that we want to use for the deserialization process.

public class Person
{
    [BsonId]
    [BsonRepresentation(BsonType.ObjectId)]

    public string Id { get; set; }

    [BsonElement("First")]
    public string FirstName { get; set; }

    [BsonElement("Last")]
    public string LastName { get; set; }
}

What you’ll notice right away is that we are not able to use a plain-ol’ class object (POCO) but instead must provide some MongoDB BSON attributes. Getting access to these will require the following libraries to be added to the using statements:

using MongoDB.Bson;
using MongoDB.Bson.Serialization.Attributes; 

There are many different annotations that you can set, of which we are using three:

·         BsonId – this attribute defines the document’s primary key. This means it will become the easiest and fastest field on which to retrieve a document.

·         BsonRepresentation – this attribute is primarily to support ease-of use. It allows the developer to pass the parameter as type string instead of having to use an ObjectId structure as this attribute handles the conversion from string to ObjectId.

·         BsonElement – this attribute maps the property name “Last” from the collection to the object property “LastName” and the property name “First” to the object property “FirstName”.

That means that the following JavaScript shows a record that would be saved as a Person type.

{
 "Id": "{29A25F7D-C2C1-4D82-9996-03C647646428}",
 "First": "Bill",
 "Last": "Penberthy"
}

We mentioned earlier that accessing the items within DocumentDB becomes straightforward when using lambdas. Thus, a high-level CRUD service could look similar to the code in Listing 1.

using MongoDB.Driver;

public class PersonService
{
    private readonly IMongoCollection<Person> persons;

    public PersonService ()
    {
        var client = new MongoClient("connection string here");
        var database = client.GetDatabase("Production");
        persons = database.GetCollection<Person>("Persons");
    }

    public async Task<List<Person>> GetAsync() => 
        await persons.Find(_ => true).ToListAsync();

    public async Task<Person?> GetByIdAsync(string id) =>
        await persons.Find(x => x.Id == id).FirstOrDefaultAsync();

    public async Task< Person?> GetByLastNamAsync(string name) =>
        await persons.Find(x => x.LastName == name).FirstOrDefaultAsync();

    public async Task CreateAsync(Person person) =>
        await persons.InsertOneAsync(person);

    public async Task UpdateAsync(string id, Person person) =>
        await persons.ReplaceOneAsync(x => x.Id == id, person);

    public async Task RemoveAsync(string id) =>
        await persons.DeleteOneAsync(x => x.Id == id);
}

  Code Listing 1. CRUD service to interact with Amazon DocumentDB

One of the conveniences with working with DocumentDB (and MongoDB for that matter) is that creating the database and the collection is automatic when the first item (or document) is saved. Thus, creating the database and book collection is just as simple as saving your first book. Of course, that means you have to make sure that you define your database and collection correctly, but it also means that you don’t have to worry about your application not connecting if you mistype the database name – you’ll just have a new database!