.NET and Amazon EventBridge

As briefly mentioned in an earlier post, Amazon EventBridge is a serverless event bus service designed to deliver data from applications and services to a variety of targets. It uses a different methodology than does SNS to distribute events.

The event producers submit their events to the service bus. From there, a set of rules determines what messages get sent to which recipients. This flow is shown in Figure 1.

Figure 1. Message flow through Amazon EventBridge.
Figure 1. Message flow through Amazon EventBridge.

The key difference between SNS and EventBridge is that in SNS you send your message to a topic, so the sender makes some decisions about where the message is going. These topics can be very broadly defined and domain-focused so that any application interested in order-related messages subscribes to the order topic, but this still obligates the sender to have some knowledge about the messing system.

In EventBridge you simply toss messages into the bus and the rules sort them to the appropriate destination. Thus, unlike SNS where the messages themselves don’t really matter as much as the topic; in EventBridge you can’t define rules without an understanding of the message on which you want to apply rules. With that in mind, we’ll go in a bit of a different order now and go into using EventBridge within a .NET application, that way we’ll have a definition of the message on which we want to apply rules.

.NET and Amazon EventBridge

The first step to interacting with EventBridge from within your .NET application is to install the appropriate NuGet package, AWSSDK.EventBridge. This will also install AWSSDK.Core. Once you have the NuGet package, you can access the appropriate APIs by adding several using statements:

using Amazon.EventBridge;
using Amazon.EventBridge.Model;

You will also need to ensure that you have added:

using System.Collections.Generic;
using System.Text.Json;

These namespaces provide access to the AmazonEventBridgeClient class that manages the interaction with the EventBridge service as well as the models that are represented in the client methods. As with SNS, you can manage all aspects of creating the various EventBridge parts such as service buses, rules, endpoints, etc. You can also use the client to push events to the bus, which is what we do now. Let’s first look at the complete code and then we will walk through the various sections.

static void Main(string[] args)
{
    var client = new AmazonEventBridgeClient();

    var order = new Order();

    var message = new PutEventsRequestEntry
    {
        Detail = JsonSerializer.Serialize(order),
        DetailType = "CreateOrder",
        EventBusName = "default",
        Source = "ProDotNetOnAWS"
    };

    var putRequest = new PutEventsRequest
    {
        Entries = new List<PutEventsRequestEntry> { message }
    };

    var response = client.PutEventsAsync(putRequest).Result;
    Console.WriteLine(
$"Request processed with ID of  
          #{response.ResponseMetadata.RequestId}");
    Console.ReadLine();
}

The first thing we are doing in the code is newing up our AmazonEventBridgeClient so that we can use the PutEventsAsync method, which is the method used to send the event to EventBridge. That method expects a PutEventsRequest object that has a field Entries that are a list of PutEventsRequestEntry objects. There should be a PutEventsRequestEntry object for every event that you want to be processed by EventBridge, so a single push to EventBridge can include multiple events.

Tip: One model of event-based architecture is to use multiple small messages that imply different items of interest. Processing an order, for example, may result in a message regarding the order itself as well as messages regarding each of the products included in the order so that the inventory count can be managed correctly. This means the Product domain doesn’t listen for order messages, they only pay attention to product messages. Each of these approaches has its own advantages and disadvantages.

The PutEventsRequestEntry contains the information to be sent. It has the following properties:

·         Detail – a valid JSON object that cannot be more than 100 levels deep.

·         DetailType – a string that provides information about the kind of detail contained within the event.

·         EventBusName – a string that determines the appropriate event bus to use. If absent, the event will be processed by the default bus.

·         Resources – a List<string> that contains ARNs which the event primarily concerns. May be empty.

·         Source – a string that defines the source of the event.

·         Time – a string that sets the time stamp of the event. If not provided EventBridge will use the time stamp of when the Put call was processed.

In our code, we only set the Detail, DetailType, EventBusName, and Source.

This code is set up in a console, so running the application gives results similar to that shown in Figure 2

Figure 2. Console application that sent a message through EventBridge
Figure 2. Console application that sent a message through EventBridge

We then used Progress Telerik Fiddler to view the request so we can see the message that was sent. The JSON from this message is shown below.

{
    "Entries":
    [
        {
            "Detail": "{\"Id\":0,
                        \"OrderDate\":
                                \"0001-01-01T00:00:00\",
                        \"CustomerId\":0,
                        \"OrderDetails\":[]}",
            "DetailType": "CreateOrder",
            "EventBusName": "default",
            "Source": "ProDotNetOnAWS"
        }
    ]
}

Now that we have the message that we want to process in EventBridge, the next step is to set up EventBridge. At a high level, configuring EventBridge in the AWS console is simple.

Configuring EventBridge in the Console

You can find Amazon EventBridge by searching in the console or by going into the Application Integration group. Your first step is to decide whether you wish to use your account’s default event bus or create a new one. Creating a custom event bus is simple as all you need to provide is a name, but we will use the default event bus.

Before going any further, you should translate the event that you sent to the event that EventBridge will be processing. You do this by going into Event buses and selecting the default event bus. This will bring you to the Event bus detail page. On the upper right, you will see a button Send events. Clicking this button will bring you to the Send events page where you can configure an event. Using the values from the JSON we looked at earlier, fill out the values as shown in Figure 3.

Figure 3. Getting the “translated” event for EventBridge
Figure 3. Getting the “translated” event for EventBridge

Once filled out, clicking the Review button brings up a window with a JSON object. Copy and paste this JSON as we will use it shortly. The JSON that we got is displayed below.

{
  "version": "0",
  "detail-type": "CreateOrder",
  "source": "ProDotNetOnAWS",
  "account": "992271736046",
  "time": "2022-08-21T19:48:09Z",
  "region": "us-west-2",
  "resources": [],
  "detail": "{\"Id\":0,\"OrderDate\":\"0001-01-01T00:00:00\",\"CustomerId\":0,\"OrderDetails\":[]}"
}

The next step is to create a rule that will evaluate the incoming messages and route them to the appropriate recipient. To do so, click on the Rules menu item and then the Create rule button. This will bring up Step 1 of the Create rule wizard. Here, you define the rule by giving it a name that must be unique by event bus, select the event bus on which the rule will run, and choose between Rule with an event pattern and Schedule. Selecting to create a schedule rule will create a rule that is run regularly on a specified schedule. We will choose to create a rule with an event pattern.

Step 2 of the wizard allows you to select the Event source. You have three options, AWS events or EventBridge partner events, Other, or All events. The first option references the ability to set rules that identify specific AWS or EventBridge partner services such as SalesForce, GitHub, or Stripe, while the last option allows you to set up destinations that will be forwarded every event that comes through the event bus. We typically see this when there is a requirement to log events in a database as they come in or some special business rule such as that. We will select Other so that we can handle custom events from our application(s).

You next can add in a sample event. You don’t have to take this action, but it is recommended to do this when writing and testing the event pattern or any filtering criteria. Since we have a sample message, we will select Enter my own and paste the sample event into the box as shown in Figure 4.

Figure 4. Adding a Sample Event when configuring EventBridge
Figure 4. Adding a Sample Event when configuring EventBridge

Be warned, however, if you just paste the event directly into the sample event it will not work as the matching algorithms will reject it as invalid without an id added into the JSON as highlighted by the golden arrow in Figure 4.

Once you have your sample event input, the next step is to create the Event pattern that will determine where this message should be sent. Since we are using a custom event, select the Custom patterns (JSON editor) option. This will bring a JSON editor window in which you enter your rule. There is a drop-down of helper functions that will help you put the proper syntax into the window but, of course, there is no option for simple matching – you have to know what that syntax is already. Fortunately, it is identical to the rule itself, so adding an event pattern that will select every event that has a detail-type of “Create Order” is:

{
  "detail-type": ["CreateOrder"]
}

Adding this into the JSON editor and selecting the Test pattern button will validate that the sample event matched the event pattern. Once you have successfully tested your pattern select the Next button to continue.

You should now be on the Step 3 Select Target(s) screen where you configure the targets that will receive the event. You have three different target types that you can select from, EventBridge event bus, EventBridge API Destination, or AWS Service. Clicking on each of the different target types will change the set of information that you will need to manage that detail the target. We will examine two of these in more detail, the EventBridge API destination, and the AWS Service, starting with the AWS service.

Selecting the AWS service radio button brings up a drop-down list of AWS services that can be targeted. Select the SNS target option. This will bring up a drop-down list of the available topics. Select the topic we worked with in the previous section and click the Next button. You will have the option configure Tags and then can Create rule.

Once we had this rule configured, we re-ran our code to send an event from the console. Within several seconds we received the email that was sent from our console application running on our local machine to EventBridge where the rule filtered the event to send it to SNS which then configured and sent the email containing the order information that we submitted from the console.

Now that we have verified the rule the fun way, let’s go back into it and make it more realistic. You can edit the targets for a rule by going into Rules from the Amazon EventBridge console and selecting the rule that you want to edit. This will bring up the details page. Click on the Targets tab and then click the Edit button. This will bring you back to the Step 3 Select Target(s) screen. From here you can choose to add an additional target (you can have up to 5 targets for each rule) or replace the target that pointed to the SNS service. We chose to replace our existing target.

Since we are looking at using EventBridge to communicate between various microservices in our application we will configure the target to go to a custom endpoint. To do so requires that we choose a Target type of EventBridge API destination. We will then choose to Create a new API destination which will provide all of the destination fields that we need to configure. These fields are listed below.

·         Name – the name of the API destination. Destinations can be reused in different rules, so make sure the name is clear.

·         Description – optional value describing the destination.

·         API destination endpoint – the URL to the endpoint which will receive the event.

·         HTTP Method – the HTTP method used to send the event, can be any of the HTTP methods.

·         Invocation rate limit per second – an optional value, defaulted to 300, of the number of invocations per second. Smaller values mean that events may not be delivered.

The next section to configure is the Connection. The connection contains information about authorization as every API request must have some kind of security method enabled. Connections can be reused as well, and there are three different Authorization types supported. These types are:

·         Basic (Username/Password) – where a username and password combination is entered into the connection definition.

·         OAuth Client Credentials – where you enter the OAuth configuration information such as Authorization endpoint, Client ID, and Client secret.

·         API Key – which adds up to 5 key\value pairs in the header.

Once you have configured your authorization protocol you can select the Next button to once again complete moving through the EventBridge rules creation UI.

There are two approaches that are commonly used when creating the rule to target API endpoint mapping. The first is a single endpoint per type of expected message. This means that, for example, if you were expecting “OrderCreated” and “OrderUpdated” messages then you would have created two separate endpoints, one to handle each message. The second approach is to create a generic endpoint for your service to which all inbound EventBridge messages are sent and then the code within the service evaluates each message and manages it from there.

Modern Event Infrastructure Creation

So far, we have managed all the event management through the console, creating topics and subscriptions in SNS and rules, connections, and targets in EventBridge. However, taking this approach in the real world will be extremely painful. Instead, modern applications are best served by modern methods of creating services; methods that can be run on their own without any human intervention. There are two approaches that we want to touch on now, Infrastructure-as-Code (IaC) and in-application code.

Infrastructure-as-Code

Using AWS CloudFormation or AWS Cloud Development Kit within the build and release process allows developers to manage the growth of their event infrastructure as their usage of events grows. Typically, you would see the work breakdown as being the teams building the systems sending events are responsible for creating the infrastructure required for sending, and the teams for the systems listening for events need to manage the creation of that infrastructure. Thus, if you are planning on using SNS then the sending system would have the responsibility for adding the applicable topic(s) while the receiving system would be responsible for adding the appropriate subscription(s) to the topics in which they are interested.

Using IaC to build out your event infrastructure allows you to scale your use of events easily and quickly. It also makes it easier to manage any changes that you may feel are necessary, as it is very common for the messaging approach to be adjusted several times as you determine the level of messaging that is appropriate for the interactions needed within your overall system.

In-Application Code

In-Application code is a completely different approach from IaC as the code to create the infrastructure resides within your application. This approach is commonly used in “configuration-oriented design”, where configuration is used to define the relationship(s) that each application plays. An example of a configuration that could be used when an organization is using SNS is below.

{
     “sendrules”:[{“name”:”Order”, “key”:”OrdersTopic”}],
     “receiverules”: [{“name”:” ProductUpdates”, 
                       “key”:” Products”,
                       “endpoint”:”$URL/events/product”}],
}

The code in the application would then ensure that every entry in the sendrules property has the appropriate topic created, so using the example above the name value represents the topic name and the key value represents the value that will be used within the application to map to the “Order” topic in SNS. The code in the application would then evaluate the receiverules value and create subscriptions for each entry.

This seems like a lot of extra work, but for environments that do not support IaC then this may be the easiest way to allow developers to manage the building of the event’s infrastructure. We have seen this approach built as a framework library included in every application that used events, and every application provided a configuration file that represented the messages they were sending and receiving. This framework library would evaluate the service(s) to see if there was anything that needed to be added and if so then add them.

Amazon RDS – MySQL for .NET Developers

Originally released in 1995, MySQL has gone through a series of “owners” since then, with it currently being primarily developed by Oracle. MySQL is free and open-sourced under the terms of the GNU General Public License (GPL). That AWS does not have to pay any licensing fee is one of the primary reasons that the cost for MySQL on Amazon RDS is the lowest of all their offerings; you are only paying for hardware and management rather than hardware, licensing, and management.

MySQL may not be as fully featured or high-powered as some of the commercial systems such as SQL Server and Oracle, however that does not mean that it is not of use to a .NET developer. One of the reasons that MySQL became popular was because of its relative simplicity and ease of use, so if all you are looking for is the ability to easily persist data in a relational database then more than likely MySQL will support your need at a fraction of the cost of SQL Server.

MySQL and .NET

Before we dig into the RDS support for MySQL, let us first briefly go over using MySQL during .NET development. The primary use case when thinking about .NET and database products is the support for Entity Framework, .NET’s default object-relational mapping (ORM) system. If there is support for that database, then using that database in your .NET application will come down to the features of the database rather than its interaction with .NET. With that in mind, let’s look at what you need to do to use MySQL and Entity Framework in your application.

The first thing you need to do is to include the necessary NuGet package, MySql.EntityFrameworkCore. Once you have the package, next is configuring your application to use MySQL. You do this by calling the UseMySQL method when overriding the OnConfiguring method in the context class as shown below:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseMySQL("connection string here");
}

A connection string for MySQL has four required fields:

  • server – server with which to connect
  • uid – user name
  • pwd – password
  • database – database with which to connect

From here on out, it is just like working with Entity Framework and SQL Server. Kinda anti-climactic, isn’t it? Let’s now go create a MySQL database.

Setting up a MySQL Database on Amazon RDS

Now that we know how to setup our .NET application to access MySQL, let’s go look at setting up MySQL. Log into the console, go to RDS, select Create database. On the Create database screen, select Standard create and then MySQL. Doing this will show you that there is only one Edition that you can select, MySQL Community. You then have a lot of different release versions that you can select from, however the NuGet packages that we used in our earlier example require a reasonably modern version of MySQL, so unless you have any specific reason to use an older version you should always use the default, most updated version.

Once you have defined the version of MySQL that you will use, your next option is to select the Template that you would like to use. You have three different templates to choose from:

Production – defaults are set to support high availability and fast, consistent performance.

Dev/Test – defaults are set in the middle of the range.

Free tier – defaults are set to the minimal, free version.

We are going to select the Free tier version to limit our costs for this walkthrough! This will create many of the default system values that you will see during the server configuration, more on these later.

The next section is Settings. Here you will create the DB instance identifier and the Master username and Master password, the login credentials as shown in Figure 1. Note that the DB instance identifier needs to be unique across all the account’s DB instances in this region, not just MySQL database instances. We used “prodotnetonaws” as both the instance identifier and the Master username. If you choose to Auto generate a password, you will get an opportunity to access that password immediately after the database is created.

Figure 1. Naming the DM instance and creating the master user

Scrolling down to the next section, DB instance class, will show that the instance class being used is the db.t2.micro (or comparable, depending upon when you are reading this) which is the free tier-compatible instance type. The next section down the page, Storage, is also filled out with the free version of storage, defaulting to 20 GiB of Allocated storage. Do not change either of these values to stay within the “free” level.

There are four additional sections. The first of these is Availability & durability, where you can create a replica in a different availability zone. Amazon RDS will automatically fail over to the standby in the case of a planned or unplanned outage of the primary. If you selected the “Free” template, then this whole section will be greyed out and you will not be able to create a Multi-AZ deployment. The second section is Connectivity. This is where you assign your Virtual private cloud (VPC) and Subnet group, determine whether your RDS instance has Public access, and assign a VPC security group. You can also select an Availability zone if desired. We left all these values at their default.

The third section is Database authentication. You have three options in this section, with the first being Password authentication (the default value) where you manage your database user credentials through MySQL’s native password authentication features. The second option in this section is Password and IAM database authentication where you use both MySQL’s password authentication features and IAM users and roles, and the last is Password and Kerberos authentication, where you use both MySQL’s password authentication features and an AWS Managed Microsoft Active Directory (AD) created with AWS Directory Service.

The last section when creating an RDS database is Additional configuration. This allows you to add any Database options, configure Backup, add Monitoring, and configure Logging, Maintenance, and to turn on Deletion protection. When deletion protection is enabled, you are not able to delete the database without first editing the database to turn off that setting. Select the Create database button when completed. This will bring you back to the Databases screen where you will likely see a notice that the database is being created as shown in Figure 2.

Figure 2. Notice that an RDS database is being created

If you had selected to auto generate a password, the button shown in Figure 2 also shows the View credential details button that you would need to click to see the generated value. Once the database is Available, you can interact with it like any other database, using endpoint and port values that are shown in the Connectivity & security tab in the database details, as shown in Figure 3, for the connection string in your .NET application.

Figure 3. MySQL database details screen showing endpoint and port

 MySQL is the most used open-source relational database in the world. However, as mentioned earlier, Oracle took control of the project in 2010 and this created some angst amongst MySQL users. This led to a forking of the source code by some of the original MySQL developers and a creation of a new open-source relational database based upon the MySQL code, MariaDB. We will look at that next.

.NET and Amazon DynamoDB

Where DocumentDB is a document store that is built around the MongoDB API, DynamoDB is a fully managed, serverless, NoSQL, key-value database. While originally built as a key-value database, DynamoDB also supports document data models, which allows you to get the speed of key-value databases along with the ability to do searches within the document that was stored as a value. It uses built-in horizontal scaling to scale to more than 10 trillion requests per day with peaks greater than 20 million requests per second over petabytes of storage. In other words, DynamoDB scales really well!

There are two different capacity modes in which you can configure each DynamoDB database, On-demand and Provisioned. If these terms sound familiar, that should not be a surprise as these are the same options you get with Amazon Aurora, AWS’s serverless relational database offering. With on-demand capacity, you pay per request for data reads and writes into your database. You do not have to specify how many reads or writes there will be as DynamoDB scales itself to support your needs as that number increases or decreases. With provisioned capacity, you specify the number of reads and writes per second. You can use auto-scaling to adjust capacity based on the utilization for when the usage of your system may grow. Provisioned capacity is less expensive, but you may end up over-provisioning your server and paying more than you need.

Setting up a DynamoDB Database

The easiest way to get a feel for DynamoDB is to use it, so let’s set one up. Log in to the console, and either search for DynamoDB or find it using Services > Database > DynamoDB. Select the Create table button on the dashboard to bring up the Create table page. The first setup section is Table details and is shown in Figure 1.

Figure 1. Creating a DynamoDB table

There are three different values that you can enter. The first, Table name, is straightforward and needs to be unique by region. We used “Person” to identify the type of data that we are going to be storing in the table. The second value is the Partition key and the third value is an optional Sort key. A simple primary key is made up only of the partition key, and no two items in the table can share the same partition key. A composite primary key, on the other hand, is made up of both a partition key and a sort key. All items with the same partition key are stored together, sorted by the sort value. If using a composite primary key, you can have multiple instances of a partition key, however, the combination of partition key and sort key must be unique.

Note – One instance where we have seen a composite key used to great effect is when different versions are being kept. As a new version is created, it gets the same partition key but a new sort key, in that case, a version number.

The keys can have different types, Binary, Number, and String. In our case, as you can see in Figure 1, we created a string partition key of “Id” without a sort key.

The next configuration section on the page is Settings, where you can select either the Default settings or Customize settings. Going with the default settings takes all the fun out of the rest of the configuration section of this article, so select to customize settings.

The next section is Table class, where you have two options, DynamoDB Standard and DynamoDB Standard-1A. The determination between these two is based upon the frequency with which data will be accessed. The less frequently your table will have reads and writes performed against it, the more likely that the 1a version will be appropriate.

Next comes the Capacity calculator section. This is an interesting tool that helps you translate your typical (or predicted) usage into the generic Read and Write units that are used for configuration, pricing, and billing. You will need to expand the section before it becomes fully available, but when you do you will get a series of fields as shown in Figure 2.

Figure 2. DynamoDB Capacity calculator

The questions that it asks are straightforward, how big your average payload is, how many reads per second will you have, how many saves per second, and what your business requirements are for those reads and writes. Let’s look at those options in a bit more detail. Average item size (kb) is an integer field capturing your average payload rounded to the nearest kilobyte. This can be frustrating, because many times your payloads may be considerably smaller than a kilobyte – but go ahead and choose 1 if your payloads are relatively small. The Item read/second and Item write/second are also straightforward integer fields and we used 25 items read per second and 4 items written per second.

The Read consistency and Write consistency fields are a little different as they are drop-downs. Read offers Eventually consistent, where it is possible that a read may not have the most recent version of the data (because it is coming from a read-only copy of the data), Strongly consistent where all reads will have the most recent version of the data (because it is coming from the primary table) and Transactional where multiple actions are submitted as a single all-or-nothing operation. Write consistency offers two approaches, Standard where the data is inserted into the primary, and Transactional which is the same as for Read consistency. In our example, we selected Strongly consistent for reads and Standard for writes. The calculator then estimated our costs at $4.36 a month in US-West-2. As you can see, it’s a pretty inexpensive option.

The next section in the table creation screen is the Read/write capacity settings. This is where the two modes we touched on earlier come in, On-demand and Provisioned. Since we went through the calculator and estimated a whopping $4.36 a month charge, we will go ahead and use the simpler On-demand option.

The next section is Secondary indexes. This is where DynamoDB varies from a lot of other Key/Value stores because it allows you to define indexes into the content – which tends to be more of a document database approach. There are two types of secondary indexes, Global and Local.  A global secondary index provides a different partition key than the one on the base table while a local secondary index uses the same partition key and a different sort key. The local secondary index requires that the base table already be using both a partition key and a sort key, but there is no such constraint on the use of a global secondary key.

The next configuration section is Encryption at rest. There are three choices, Owned by Amazon DynamoDB where the application manages encryption using AWS keys, AWS managed key where AWS creates keys and then manages those keys within your AWS Key Management Service (KMS), and Customer managed key where you can create and manage the KMS key yourself. We selected the AWS-owned key.

The last configuration section is Tags. Once your table is configured, select the Create table button at the bottom. This will bring up the Tables listing with your table in a “Creating” status as shown in Figure 3. Once the creation is complete, the status will change to “Active”.

Figure 3. Tables listing after creating a table.

Unlike Amazon DocumentDB, DynamoDB gives you the ability to work directly with the data within the table. Once the table has been created, click on the table name to go into the table detail screen. This page gives you a lot of information, including metrics on table usage. We won’t go into that into much detail, simply because this could use its own book, so instead click on the button at the top, Explore table items. This will bring you to a page where you can interact with the items within the table. There is also a Create item button at the top. We used this button to create two simple items in the table, with the first shown in Figure 4.

Figure 4. Creating an item in Amazon DynamoDB.

If this format seems a little unusual, it is because this is the DynamoDB JSON, which is different from “traditional” JSON in that it stores the items as their own key/value pairs. If you turn off the View DynamoDB JSON selector at the top, then you will see the more standard JSON:

{
 "Id": "{29A25F7D-C2C1-4D82-9996-03C647646428}",
 "FirstName": "Bill",
 "LastName": "Penberthy"
}

DynamoDB and AWS Toolkit for Visual Studio

Unlike DocumentDB, which has no support in any of the IDE toolkits, you have the ability to access DynamoDB from the Toolkit for Visual Studio. Using the toolkit, you can both view the table and look at items within the table as shown in Figure 5.

Figure 5. Using the Toolkit for Visual Studio to view DynamoDB items

You can even use the toolkit to filter returns by selecting the Add button within the top box. This will add a line with a drop-down that includes all of the field names (Id, FirstName, and LastName) and allow you to enter a filter value. Selecting “LastName” “Equal” “Steve” and then clicking the Scan Table button will result in only one result remaining in the list as shown in Figure 6.

Figure 6. Filtering DynamoDB items in Visual Studio

The toolkit will also allow you to add, edit, and delete items simply by double-clicking on the item in the result list. Since you are working with the items in a list form, you can even use the Add Attribute button to add a new “column” where you can capture new information. Once you add a value to that new column and Commit Changes, those items (where you added the value) will be updated.

As you can imagine, the ability to interact with the data directly in Visual Studio makes working with the service much easier, as you can look directly into the data to understand what you should get when parts of your code are run in Debug or when running integration tests. Unfortunately, however, this functionality is only supported in the AWS Toolkit for Visual Studio and is not available in either Rider or Visual Studio Code toolkits.

DynamoDB and .NET

The last step is to take a look at using DynamoDB within your .NET application. As mentioned earlier, using DynamoDB means that you will not be using Entity Framework as you did with the relational databases earlier. Instead, we will be using a DynamoDB context, which provides very similar support as does the DBContext in Entity Framework.

Note: One of the interesting features of using DynamoDB within your development process is the availability of a downloadable version of DynamoDB. Yes, you read that correctly, you can download and locally install a version of the DynamoDB as either a Java application, an Apache Maven dependency, or as a Docker image. Details on this can be found at https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/DynamoDBLocal.html

In many ways, the .NET SDK for DynamoDB is one of the more advanced SDKs as it offers support in three layers:

Low-level interface – the APIs in this interface relate very closely to the service model and there is minimal help functionality.

Document interface – This API includes constructs around the Document and Table classes so there is minimal built-in functionality to help do things like converting to business objects.

High-level interface – This is where AWS provides support around converting Documents to .NET classes and other helpful interactions.

Your code can interact with any of the interfaces based upon your business need. We will be relying on the high-level interface as we move into the code examples.

First, you need to add the appropriate NuGet package, AWSSDK.DynamoDBv2. Once you have that added, the next thing that you need to do is to configure your connection to the database. The following code snippet shows a constructor method to do this.

private AmazonDynamoDBClient client;
private DynamoDBContext context;

public DataClient()
{
    client = new AmazonDynamoDBClient();
    context = new DynamoDBContext(client);
}

There are two objects introduced in this snippet. The first class introduced is the Amazon.DynamoDBv2.AmazonDynamoDBClient. This class provides the default implementation for accessing the service. The constructor used in the example will default to credentials stored in the application’s default configuration. Running this on your local machine means that the application will use your “default” profile to connect. There are other constructors that you can use, ranging from passing in your Access Key ID and Secret Key, to using Credentials stored in AWS Key Manager. For this example, however, we will stick with the default constructor. The second object introduced is the Amazon.DynamoDBv2.DataModel.DynamoDBContext. The DataModel part of the namespace indicates that this is a high-level interface based upon the low-level interface offered by the AmazonDynamoDBClient class.

Now that we have defined the context, let’s look at how you would use it. The following is a method to save an object into the table.

public async Task SaveAsync<T>(T item)
{
    await context.SaveAsync<T>(item);
}

This is where you start to see the power offered by the high-level interface. Let’s step out of this class and look at how this is used.

public class Person
{
    public string Id { get; set; }
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

public async Task<string> Add(string firstName, string lastName)
{
    var client = new DataClient();

    Person itemToAdd = new Person { 
        Id = Guid.NewGuid().ToString("B").ToUpper(), 
        FirstName = firstName, 
        LastName = lastName 
    };

    await client.SaveAsync<Person>(itemToAdd);
    return itemToAdd.Id;
}

This Add method is taking in a first name and a last name, creates a Person object, persists the information to DynamoDB, and then returns the Id to the calling method. And that is what the high-level interface offers. You could do the same work yourself using the Document interface, but you would have to manage all of the serialization and deserialization necessary to convert from the business objects to the JSON that is stored in the table.

One other feature of the high-level interface is much less obvious. Think about when we created the DynamoDB table earlier, and the name that we used – “Person”. By default, the high-level interface expects the class name of the item being persisted to be the same as the table name, as it is in our case.

We just went over adding an item to the table through the high-level interface. Let’s now look at an example of retrieving an item.

public async Task<T> FindByIdAsync<T>(string id)
{
    var condition = new List<ScanCondition> { 
new ScanCondition("Id", ScanOperator.Equal, id) };
    AsyncSearch<T> search = context.ScanAsync<T>(condition);
    var list = await search.GetRemainingAsync();
    return list.FirstOrDefault();
}

You can see that this gets a little more complicated.  Because this code is doing a scan of the data, it is going to always return a List<T>, even though we set the Id as the primary key on the table. This happens because the high-level interface does not know anything about the definition of the table itself and thus generalizes the result set.

This scanning approach should not feel new, however. Think back to how the filtering was set in the AWS Toolkit for Visual Studio (Figure 6) and you will see that this is the same approach. This approach is used because of those enhancements into DynamoDB that make it more document database-like; it allows you to scan through the data looking for a specific condition, in this case, the Id equal to the value passed into the FindByIdAsync method. And, just as shown in the toolkit, you can use multiple conditions.

public async Task<List<T>> FindByValueAsync<T>(Dictionary<string,object> searchDict)
{
    var conditions = new List<ScanCondition>();
    foreach(string key in searchDict.Keys)
    {
        conditions.Add(
          new ScanCondition(key,
                    ScanOperator.Equal,
                    searchDict[key]));
    }
    AsyncSearch<T> search = context.ScanAsync<T>(conditions);
    return await search.GetRemainingAsync();
}

In this instance, we are simply accepting a Dictionary<string, string> where we are assuming the key value will be the field name, such as LastName, and the dictionary value will be the value to use when filtering. An empty dictionary means that no filters will be set which, as you can imagine, would be somewhat terrifying if you consider a massive table with petabytes of data. That’s where the return class from the ScanAsync method comes into play, the AsynchSearch<T> class. 

AsynchSearch is an intermediate class that provides several ways of interacting with the service. In the code example above, the method used on that object was GetRemainingAsync(). The GetRemainingAsync method is used to get all the remaining items that match the filter condition and bring them back as a single unit. However, there is another method on AsynchSearch, GetNextSetAsync, which manages a finite set of items – up to 1MB of items. You can examine a property on the AsynchSearch object, IsDone, which tells you whether the current result set is the final one and gives you the ability to manage the pagination yourself.

We have spent time going through the High-level interface provided by the SDK. We have not even touched on the powerful Document interface and how that provides more specific control over the values stored in DynamoDB. There are many examples of development teams using the Document-level interfaces and writing their own high-level interfaces where they can incorporate their specific requirements rather than using the SDK’s more generic approach. That approach is not wrong, as no one understands your needs as well as you do – but having the High-level interface allows you to easily get a large majority of requirements fulfilled and then you can customize using the Document interface as you see fit.

There is a lot more that we can go into about using DynamoDB with .NET, like an entire book, so we won’t do that here. Let it suffice to say, yes, you can use DynamoDB with .NET.

What kind of “NoSQL” are you talking about?

Too many times I hear NoSQL databases as if they are a single way of “doing business.” However, there are multiple types of NoSQL databases, each with its specializations and ways in which they approach persisting and accessing information. So just saying “use NoSQL” isn’t really all that useful.

With that in mind, lets’ take a look at the various choices that you have to consider. The main flavors of NoSQL databases are document databases, key-value stores, column-oriented databases, and graph databases. There are also databases that are combination of these and even some other highly specialized databases that do not depend of relationships and SQL, so they could probably be NoSQL as well. That being said, let’s look at those four primary types of NoSQL databases.

Document Databases

A document database stores data in JSON as shown in Figure 10-3, BSON, or XML. BSON stands for “Binary JSON.” BSON’s binary structure allows for the encoding of type and length information. This encoding tends to allow the data to be parsed much more quickly and it helps support indexing and querying into the properties of the document being stored.

Figure 1. Document database interpretation of object

Documents are typically stored in a format that is much closer to the data model being worked with in the application. The examples used above show this clearly. Additional functionality supported by document databases tend to be around ways of indexing and\or querying data within the document. The Id will generally be the key used for primary access but imagine cases like where you want to search the data for all instances of a Person with a FirstName of “Steve”. In SQL that is easy because that data is segregated into its own column. It is more complex in a NoSQL database because that query must now go rummaging around the data within the document which, as you can probably imagine, will be slow without a lot of work around query optimization.  Common examples of document databases include MongoDB, Amazon DocumentDB, and Microsoft Azure CosmosDB.

Key-Value Stores

A key-value store is the simplest of the NoSQL databases. In these types of databases, every element in the database is stored as a simple key-value pair made up of an attribute, or key, and the value, which can be anything. Literally, anything. It can be a simple string, a JSON document like is stored in a document database, or even something very unstructured like an image or some other blob of binary data.

The difference between Document databases and Key-Value databases is generally their focus. Document databases focus on ways of parsing the document so that users can do things like making decisions about data within that document. Key-value databases are very different. They care much less what the value is, so they tend to focus more on storing and accessing the value as quickly as possible and making sure that they can easily scale to store A LOT of different values. Examples of Key-Value databases include Amazon DynamoDB, BerkeleyDB, Memcached, and Redis.

Those of you familiar with the concept of caching, where you create a high-speed data storage layer that stores a subset of data so that accessing that data is more responsive than calling directly into the database, will see two products that are commonly used for caching, Memcached and Redis. These are excellent examples of the speed vs. document analysis trade-off between these two types of databases.

Column-Oriented Databases

Yes, the name of these types of NoSQL databases may initially be confusing because you may think about how the data in relational databases are broken down into columns. However, the data is accessed and managed differently. In a relational database, an item is best thought of as a row, with each column containing an aspect or value referring to a specific part of that item. A column-oriented database instead stores data tables by column rather than row.

Let’s take a closer look at that since that sounds pretty crazy. Figure 2 shows a snippet of a data table like you are used to seeing in a relational database system, where the RowId is an internal value used to refer to the data (only within the system) while the Id is the external value used to refer to the data.

Figure 2. Data within a relational database table, stored as a row.

Since row-oriented systems are designed to efficiently return data for a row, this data could be saved to disk as:

1:100,Bob,Smith;
2:101,Bill,Penberthy;

In a column-oriented database, however, this data would instead be saved as

100:1,101:2;
Bob:1,Bill:2;
Smith:1,Penberthy:2

Why would they take that kind of approach? Mainly, speed of access when pulling a subset of values for an “item”. Let’s look at an index for a row-based system. An index is a structure that is associated with a database table to speed the retrieval of rows from that table. Typically, the index sorts the list of values so that it is easier to find a specific value. If you consider the data in Figure 10-4 and know that it is very common to query based on a specific LastName, then creating a database index on the LastName field means that a query looking for “Roberts” would not have to scan every row in the table to find the data, as that price was paid when building the index. Thus, an index on the LastName field would look something like the snipped below, where all LastNames have been alphabetized and stored independently:

Penberthy:2;
Smith:1

If you look at this definition of an index, and look back at the column-oriented data, you will see a very similar approach. What that means is that when accessing a limited subset of data or the data is only sparsely populated, a column-oriented database will most likely be much faster than accessing a row-based database. If, on the other hand, you tend to use a broader set of data then storing that information in a row would likely be much more performant. Examples of column-oriented systems include Apache Kudu, Amazon Redshift, and Snowflake.

Graph Databases

The last major type of NoSQL databases is graph databases. A graph database focuses on storing nodes and relationships. A node is an entity, is generally stored with a label, such as Person, and contains a set of key-value pairs, or properties. That means you can think of a node as being very similar to the document as stored in a database. However, a graph database takes this much further as it also stores information about relationships.

A relationship is a directed and named connection between two different nodes, such as Person Knows Person. A relationship always has a direction, a starting node, an ending node, and a type of relationship. A relationship also can have a set of properties, just like a node. Nodes can have any number or type of relationships and there is no effect on performance when querying those relationships or nodes. Lastly, while relationships are directed, they can be navigated efficiently in either direction.

Let’s take a closer look at this. From the example above, there are two people, Bill and Bob that are friends. Both Bill and Bob watched the movie “Black Panther”. This means there are 3 nodes, with 4 known relationships:

  • Bill is friends with Bob
  • Bob is friends with Bill
  • Bill watched “Black Panther”
  • Bob watched “Black Panther”     

Figure 3 shows how this is visualized in a graph model.

Figure 3. Graph model representation with relationships

Since both nodes and relationships can contain additional information, we can include more information in each node, such as LastName for the Persons or a Publish date for the Book, and we can add additional information to a relationship, such as a StartDate for writing the book or a Date for when they become friends.

Graph databases tend to have their own way of querying data because you can be pulling data based upon both nodes and relationships. There are 3 common query approaches:

·         Gremlin – part of an Apache project is used for creating and querying property graphs.  A query to get everyone Bill knows would look like this:

g.V().has("FirstName","Bill").
  out("Is_Friends_With").
values("FirstName")

·         SPARQL – a World Wide Web Consortium supported project for a declarative language for graph pattern matching. A query to get everyone Steve knows would look like this:

PREFIX vcard: http://www.w3.org/2001/vcard-rdf/3.0#
SELECT ?FirstName
WHERE {?person vcard:Is_Friends_With ?person }

·         openCypher – a declarative query language for property graphs that was originally developed by Neo4j and then open-sourced. A query to get everyone Bill knows would look like this:

MATCH (me)-[:Is_Friends_With*1]-(remote_friend)
WHERE me.FirstName = Bill
RETURN remote_friend.FirstName

The most common graph databases are Neo4j, ArangoDB, and Amazon Neptune.

Next time – don’t just say “lets put that in a NoSQL database.” Instead, say what kind of NoSQL database makes sense for that combination of data and business need. That way, it is obvious you know what you are saying and have put some thought into the idea!

Selecting a BI Data Stack – Snowflake vs Azure Synapse Analytics

This article is designed to help choose the appropriate technical tools for building out a Data Lake and a Data Warehouse.

Components

A Data Lake is a centralized repository that allows you to store structured, semi-structured, and unstructured data at scale. The idea is that copies of all your transactional data are gathered, stored, and made available for additional processing, whether it is transformed and moved into a data warehouse or made available for direct usage for analytics such as machine learning or BI reporting.

A Data Warehouse, on the other hand, is a digital storage system that connects and harmonizes large amounts of data from many different sources. Its purpose is to feed business intelligence (BI), reporting, and analytics, and support regulatory requirements – so companies can turn their data into insight and make smart, data-driven decisions. Data warehouses store current and historical data in one place and act as the single source of truth for an organization.

While a data warehouse is a database structured and optimized to provide insight, a data lake is different because it stores relational data from line of business applications and non-relational data from other systems. The structure of the data or schema is not defined when data is captured. This means you can store all your data without careful design or the need-to-know what questions you might need answers for in the future. Different types of analytics on your data like SQL queries, big data analytics, full text search, real-time analytics, and machine learning can be used to uncover insights.

General Requirements 

There are a lot of different ways to implement a data lake. The first is in a simple object storage system such as Amazon S3. The other is through a combination of ways, such as tying together a database system in which to store relational data and an object storage system to store unstructured or semi-structured data, or adding in stream-processing, or a key/value database, or any number of different products that handle the job similarly. With all these different approaches potentially filling the definition of “data lake”, the first thing we must do is to define deeper the data lake requirements that we will use for the comparison.

  • Relational data –One requirement is that the data stack include multiple SAAS systems that will providing data to the data lake. These systems default to a relational\object-based structures where objects are consistently defined and are inter-related. This means that this information will be best served by storing within a relational database system. Examples of systems providing this type of data include Salesforce, Marketo, OpenAir, and NetSuite.
  • Semi-structured data – some of the systems that will act as sources for this data lake contain semi structured data, which is data that has some structure but does not necessarily conform to a data model. Semi-structured data may mostly “look like each other” but each piece of data may have different attributes or properties. Data from these sources could be transformed into a relational approach, but this transformation gets away from storing the data in its original state. An example of sources of this kind of data include logs, such as Splunk or application logs.
  • Unstructured data – Unstructured data is data which does not conform to a data model and has no easily identifiable structure, such as images, videos, documents, etc. At this time, there are no identified providers of unstructured data that is expected to be managed within the data lake.

Thus, fulfilling this set of data lake requirements means support for both relational\structured and semi-structured data are “must-haves”, and the ability to manage unstructured data is a “nice-to-have”.

The next set of requirements is for both data lakes and data warehouses are around being able to add, access, and extract data, with there being many more requirements around being able to manage data as it is being read and extracted. These requirements are:

  • Input – The data lake must be able to support write and edit access from external systems. The system must support authentication and authorization as well as the ability to automate data movement without having to rely on internal processes. Thus, a tool requiring a human to click through a UI is not acceptable.
  • Access – the data lake must be able to support different ways of accessing data from external sources, so the tool:
    • must be able to support the processing of SQL (or similar approaches) for queries to limit/filter data being returned
    • must be supported as a data source for Tableau, Excel, and other reporting systems
    • should be able to give control over the number of resources used when executing queries to have control over the performance/price ratio

Top Contenders

There are a lot of products available for consideration that support either, or both, data lake and data warehouse needs. However, one of the key pivots was to only look at offerings that were available as a managed service in the cloud rather than a product that would be installed and maintained on-premise and that the product can support both data lake and data warehouse needs. With that in mind, we narrowed the list down to 5 contenders: 1) Google BigQuery, 2) Amazon Redshift, 3) Azure Synapse Analytics, 4) Teradata Vantage, and 5) Snowflake Data Cloud based upon the Forrester Wave™ on Cloud Data Warehouses study as shown below in Figure 1.

Figure 1 – Forrester Wave on Cloud Data Warehouses

The Forrester scoring was based upon 25 different criteria, of which we pulled out the seven that were the most important to our defined set of requirements and major use cases (in order of importance):

  1. Data Ingestion and Loading – The ability to easily insert and transform data into the product
  2. Administration Automation – An ability to automate all the administration tasks, including user provisioning, data management, and operations support
  3. Data Types – The ability to support multiple types of data, including structured, semi-structured, unstructured, files, etc.
  4. Scalability Features – The ability to easily and automatically scale for both storage and execution capabilities
  5. Security Features – The ability to control access to data stored in the product. This includes management at database, table, row, and column levels.
  6. Support – The strength of support offered by the vendor, including training, certification, knowledge base, and a support organization.
  7. Partners – The strength of the partner network, other companies that are experienced in implementing the software as well as creating additional integration points and/or features.
Google BigQueryAzure Synapse AnalyticsTeradata VantageSnowflake Data CloudAmazon Redshift
Forrester Scores   
Data Ingestion and Loading5.003.003.005.005.00
Administration Automation3.003.003.005.003.00
Data Types5.005.005.005.005.00
Scalability Features5.005.003.005.003.00
Security Features3.003.003.005.005.00
Support5.005.003.005.005.00
Partners5.001.005.005.005.00
Average of Forrester Scores4.433.573.575.004.43
Table 1 – Base Forrester scores for selected products

This shows a difference between the overall scores, based on the full list of 25 items, and the list of scores based upon those areas where it was determined our requirements pointed to the most need. Snowflake was an easy selection into the in-depth, final evaluation, but the choices for the other products to dig deeper into was more problematic. In this first evaluation we compared Snowflake to Azure Synapse Analytics. Other comparisons will follow.

Final Evaluation

There are two additional areas in which we dive deeper into the differences between Synapse and Snowflake, cost, and execution processing. Execution processing is important because the more granular the ability to manage execution resources, the more control the customer has over performance in their environment.

Cost

Calculating cost for data platforms generally comes down to four different areas, data storage, processing memory, processing CPUs, and input/output. There are two different approaches to Azure’s pricing for Synapse, Dedicated and Serverless. Dedicated is broken down into 2 values, storage and an abstract representative of compute resources and performance called a Data Warehouse Unit (DWU). The Serverless consumption model, on the other hand, charges by TB of data processed. Thus, a query that scans the entire database would be very expensive while small, very concise queries will be relatively inexpensive. Snowflake has taken a similar approach to the Dedicated mode, with their abstract representation called a Virtual Warehouse (VW).  We will build a pricing example that includes both Data Lake and Data Warehouse usage using the following customer definition:

  • 20 TBs of data, adding 50 GB a week
  • 2.5 hours a day on a 2 CPU, 32GB RAM machine to support incoming ETL work.
  • 10 users working 8 hours, 5 days a week, currently supported by a 4 CPU, 48GB RAM machine
ProductDescriptionPriceNotes
Synapse Ded.Storage Requirements$ 4,60020 TB * $23 (per TB month) * 12 months
Synapse Ded.ETL Processing$ 2,190400 Credits * 2.5 hours * 365 days * $0.006 a credit
Synapse Ded.User Processing$ 11,5201000 Credits * 8 hours * 240 days * $.006 a credit
Synapse Ded.Total$ 18,310 
Synapse Serv.Storage (Serverless)$ 4,56020 TB * $19 (per TB month) * 12 months
Synapse Serv.Compute$ 6,091((5 TB * 240 days) + (.05 TB * 365 days)) * $5
Synapse Serv.Total$ 10,651 
SnowflakeStorage Requirements$ 1,1044 TB (20 TB uncompressed) * $23 (per TB month)
SnowflakeETL Processing$ 3,6502 Credits * 2.5 hours * 365 days * $2 a credit
SnowflakeUser Processing$ 15,3604 Credits * 8 hours * 240 days * $2 a credit
SnowflakeTotal$ 20,114 

As an estimate, before considering any discounts, Synapse is approximately 10% less expensive than Snowflake when using Dedicated consumption and almost 50% less costly when using Serverless consumption, though that is very much an estimate because Serverless charges by the TB processed in a query, so large or poorly constructed queries can be exponentially more expensive. However, there is some cost remediation of the Snowflake plan because the movement of data from the data lake to the data warehouse is simplified. Snowflake’s structure and custom SQL language allow the movement of data between databases using SQL, so there is only the one-time processing fee for the SQL execution rather than paying for execution of query to get data out of the system and the execution to get data into the system.  Synapse offers some of the same capability, but much of that is outside of the data management system and the advantage gained in the process is less than that offered by Snowflake.

Execution Processing

The ability to flexibly scale the number of resources that you use when executing a command is invaluable. Both products do this to some extent. Snowflake, for example, uses the concept of a Warehouse, which is simply a cluster of compute resources such as CPU, memory, and temporary storage. A warehouse can either be running or suspended, and charges accrue only when running. The flexibility that Snowflake operates is around these warehouses, where the user can provision different warehouses, or sizes of work, and then select the desired data warehouse, or processing level, when connecting to the server to perform work as shown in Figure 2. This allows for matching the processing to the query being done. Selecting a single item from the database by it’s primary key may not need to be run using the most powerful (expensive) warehouse, but a report doing complete table scans across TBs of data could use the boost. Another example would be a generated scheduled report where the processing taking slightly longer is not an issue.

Figure 2 – Switching Snowflake warehouses to change dedicated compute resources for a query

Figure 3 shows the connection management screen in the integration tool Boomi, with the arrow pointing towards where you can set the desired warehouse when you connect to Snowflake.

Figure 3 – Boomi connection asking for Warehouse name

Synapse takes a different approach. You either use a Serverless model, where you have no control over the processing resources or a Dedicated model where you determine the level of resource consumption when you create the SQL pool which will store your data. You do not have the ability to scale up or down the execution resources being used.

Due to this flexibility, Snowflake has the advantage in execution processing.

Recommendation

In summary, Azure Synapse Analytics has an advantage on cost while Snowflake has an advantage on the flexibility with which you can manage system processing resources. When bringing in the previous scoring as provided by Forrester, Snowflake has the advantage based on its strength around data ingestion and administration automation, both areas that help ease the speed of delivery as well as help ease the burden around operational support.  Based off this analysis my recommendation would be to use Snowflake for the entire data stack, including both Data Lake and Data Warehouse rather than Azure Synapse Analytics.

.NET and Containers on AWS (Part 2: AWS App Runner)

The newest entry into AWS container management does a lot to remove the amount of configuration and management that you must use when working with containers. App Runner is a fully managed service that automatically builds and deploys the application as well creating the load balancer. App Runner also manages the scaling up and down based upon traffic. What do you, as a developer, have to do to get your container running in App Runner? Let’s take a look.

First, log into AWS and go to the App Runner console home. If you click on the All Service link you will notice a surprising AWS decision in that App Runner is found under the Compute section rather than the Containers section, even though its purpose is to easily run containers. Click on the Create an App Runner Service button to get the Step 1 page as shown below:

Creating an App Runner service

The first section, Source, requires you to identify where the container image that you want to deploy is stored. You currently, at the time of this writing, can choose either a container registry, Amazon ECR, or a source code repository. Since we have already loaded an image in ECR, let us move forward with this option by ensuring Container registry and Amazon ECR are selected, and then clicking the Browse button to bring up the image selection screen as shown below.

Selecting a container image from ECR

In this screen we selected the “prodotnetonaws” image repository that we created in the last post and the container image with the tag of “latest”.

Once you have completed the Source section, the next step is to determine the Deployment settings for your container. Here, your choices are to use the Deployment trigger of Manual, which means that you must fire off each deployment yourself using the App Runner console or the AWS CLI, or Automatic, where App Runner watches your repository, deploying the new version of the container every time the image changes. In this case, we will choose Manual so that we have full control of the deployment.

Warning: When you have your deployment settings set to Automatic, every time the image is updated will trigger a deployment. This may be appropriate in a development or even test environment, but it is unlikely that you will want to use this setting in a production setting.

The last data that you need to enter on this page is to give App Runner an ECR access role that App Runner will use to access ECR. In this case, we will select Create new service role and App Runner will pre-select a Service name role. Click the Next button when completed.

The next step is entitled Configure service and is designed to, surprisingly enough, help you configure the service. There are 5 sections on this page, Service settings, Auto scaling, Health check, Security, and Tags. The only section that is expanded is the first, all of the other sections need to be expanded before you see the options.

The first section, Service settings, with default settings can be seen below.

Service settings in App Runner

Here you set the Service name, select the Virtual CPU & memory, configure any optional Environmental variables that you may need, and determine the TCP Port that your service will use. If you are using the sample application that we loaded into ECR earlier in the chapter you will need to change the port value from the default 8080 to port 80 so that it will serve the application we configured in the container. You also have the ability, under Additional configuration, to add a Start command which will be run on launch. This is generally left blank if you have configured the entry point within the container image. We gave the service the name “ProDotNetOnAWS-AR” and let all the rest of the settings in this section remain as default.

The next section is Auto scaling, and there are two major options, Default configuration and Custom configuration, each of which provide the ability to set the auto scaling values as shown below.

Setting the Auto scaling settings in App Runner

The first of these auto scaling values is Concurrency. This value represents the maximum number of concurrent requests that an instance can process before App Runner scales up the service. The default configuration has this set at 100 requests that you can customize if using the Custom configuration setting.

The next value is Minimum size, or the number of instances that App Runner provisions for your service, regardless of concurrent usage. This means that there may be times where some of these provisioned instances are not being use. You will be charged for the memory usage of all provisioned instances, but only for the CPU of those instances that are actually handling traffic. The default configuration for minimum size is set to 1 instance.

The last value is Maximum size. This value represents the maximum number of instances to which your service will scale; once your service reaches the maximum size there will be no additional scaling no matter the number of concurrent requests. The default configuration for maximum size is 25 instances.

If any of the default values do not match your need, you will need to create a custom configuration, which will give you control over each of these configuration values. To do this, select Custom configuration. This will display a drop-down that contains all of the App Runner configurations you have available (currently it will only have “DefaultConfiguration”) and an Add new button. Clicking this button will bring up the entry screen as shown below.

Customizing auto scaling in App Runner

The next section after you configure auto scaling is Health check. The first value you set in this section is the Timeout, which describes the amount of time, in seconds, that the load balancer will wait for a health check response. The default timeout is 5 seconds. You also can set the Interval, which is the number of seconds between health checks of each instance and is defaulted to 10 seconds. You can also set the Unhealthy and Health thresholds, where the unhealthy threshold is the number of consecutive health check failures that means that an instance is unhealthy and needs to be recycled and the health threshold is the number of consecutive successful health checks necessary for an instance to be determined to be healthy. The default for these values is 5 requests for unhealthy and 1 request for healthy.

You next can assign an IAM role to the instance in the Security section. This IAM role will be used by the running application if it needs to communicate to other AWS services, such as S3 or a database server. The last section is Tags, where you can enter one or more tags to the App Runner service.

Once you have finished configuring the service, clicking the Next button will bring you to the review screen. Clicking the Create and deploy button on this screen will give the approval for App Runner to create the service, deploy the container image, and run it so that the application is available. You will be presented with the service details page and a banner that informs you that “Create service is in progress.” This process will take several minutes, and when completed will take you to the properties page as shown below.

After App Runner is completed

Once the service is created and the status is displayed as Running, you will see a value for a Default domain which represents the external-facing URL. Clicking on it will bring up the home page for your containerized sample application.

There are five tabs displayed under the domain, Logs, Activity, Metrics, Configuration, and Custom Domain. The Logs section displays the Event, Deployment, and Application logs for this App Runner service. This is where you will be able to look for any problems during deployment or running of the container itself. You should be able, under the Event log section, to see the listing of events from the service creation. The Activity tab is very similar, in that it displays a list of activities taken by your service, such as creation and deployment.

The next tab, Metrics, tracks metrics related to the entire App Runner service. This where you will be able to see information on HTTP connections and requests, as well as being able to track the changes in the number of used and unused instances. By going into the sample application (at the default domain) and clicking around the site, you should see these values change and a graph become available that provides insight into these various activities.

The Configuration tab allows you to view and edit many of the settings that we set in the original service creation. There are two different sections where you can make edits. The first is at the Source and deployment level, where you can change the container source and whether the deployment will be manual or automatically happen when the repository is updated. The second section where you can make changes is at the Configure service level where you are able to change your current settings for autoscaling, health check, and security.

The last tab on the details page is Custom domain. The default domain will always be available for your application; however, it is likely that you will want to have other domain names pointed to it – we certainly wouldn’t want to use https://SomeRandomValue.us-east-2.awsapprunner.com for our company’s web address. Linking domains is straightforward from the App Runner side. Simply click the Link domain button and input the custom domain that you would like to link, we of course used “prodotnetonaws.com”. Note that this does not include “www”, because this usage is currently not supported through the App Runner console. Once you enter the customer domain name, you will be presented the Configure DNS page as shown below.

Configuring a custom domain in App Runner

This page contains a set of certificate validation records that you need to add to your Domain Name System (DNS) so that App Runner can validate that you own or control the domain. You will also need to add CNAME records to your DNS to target the App Runner domain; you will need to add one record for the custom domain and another for the www subdomain if so desired. Once the certificate validation records are added to your DNS, the customer domain status will become Active and traffic will be directed to your App Runner instance. This evaluation can take anywhere from minutes to up to 48 hours.

Once your App Runner instance is up and running this first time, there are several actions that you can take on it as shown in the upper right corner as shown a couple of images ago. The first is the orange Deploy button. This will deploy the container from either the container or source code repository depending upon your configuration. You also can Delete the service, which is straightforward, as well as Pause the service. There are some things to consider when you pause your App Runner service. The first is that your application will lose all state – much as if you were deploying a new service. The second consideration is that if you are pausing your service because of a code defect, you will not be able to redeploy a new (presumably fixed) container without first resuming the service.

.NET and Containers on AWS (Part 1: ECR)

Amazon Elastic Container Registry (ECR) is a system designed to support storing and managing Docker and Open Container Initiative (OCI) images and OCI-compatible artifacts. ECR can act as a private container image repository, where you can store your own images, a public container image repository for managing publicly accessible images, and to manage access to other public image repositories. ECR also provides lifecycle policies, that allow you to manage the lifecycles of images within your repository, image scanning, so that you can identify any potential software vulnerabilities, and cross-region and cross-account image replication so that your images can be available wherever you need them.

As with the rest of AWS services, ECR is built on top of other services. For example, Amazon ECR stores container images in Amazon S3 buckets so that server-side encryption comes by default. Or, if needed, you can use server-side encryption with KMS keys stored in AWS Key Management Service (AWS KMS), all of which you can configure as you create the registry. As you can likely guess, IAM manages access rights to the images, supporting everything from strict rules to anonymous access to support the concept of a public repository.

There are several different ways to create a repository. The first is through the ECR console by selecting the Create repository button. This will take you to the Create repository page as shown below:

Through this page you can set the Visibility settings, Image scan settings, and Encryption settings. There are two visibility settings, Private and Public. Private repositories are managed through permissions managed in IAM and are part of the AWS Free Tier, with 500 MB-month of storage for one year for your private repositories. Public repositories are openly visible and available for open pulls. Amazon ECR offers you 50 GB-month of always-free storage for your public repositories, and you can transfer 500 GB of data to the internet for free from a public repository each month anonymously (without using an AWS account.) If authenticating to a public repository on ECR, you can transfer 5 TB of data to the internet for free each month and you get unlimited bandwidth for free when transferring data from a public repository in ECR to any AWS compute resources in any region  

Enabling Scan on push on means that every image that is uploaded to the repository will be scanned. This scanning is designed to help identify any software vulnerabilities in the uploaded image, and will automatically run every 24 hours, but turning this on will ensure that the image is checked before it can ever be used. The scanning is being done using the Common Vulnerabilities and Exposures (CVEs) database from the Clair project, outputting a list of scan findings.

Note: Clair is an open-source project that was created for the static analysis of vulnerabilities in application containers (currently including OCI and docker). The goal of the project is to enable a more transparent view of the security of container-based infrastructure – the project was named Clair after the French term which translates to clear, bright, or transparent.

The last section is Encryption settings. When this is enabled, as shown below, ECR will use AWS Key Management Service (KMS) to manage the encryption of the images stored in the repository, rather than the default encryption approach.

You can use either the default settings, where ECR creates a default key (with an alias of aws/ecr) or you can Customize encryption settings and either select a pre-existing key or create a new key that will be used for the encryption.

Other approaches for creating an ECR repo

Just as with all the other services that we have talked about so far, there is the UI-driven way to build an ECR like we just went through and then several other approaches to creating a repo.

AWS CLI

You can create an ECR repository in the AWS CLI using the create-repository command as part of the ECR service.

C:\>aws ecr create-repository

    –repository-name prodotnetonaws

    –image-scanning-configuration scanOnPush=true

    –region us-east-1

You can control all of the basic repository settings through the CLI just as you can when creating the repository through the ECR console, including assigning encryption keys and defining the repository URI.

AWS Tools for PowerShell

And, as you probably aren’t too surprised to find out, you can also create an ECR repository using AWS Tools for PowerShell:

C:\> New-ECRRepository

-RepositoryName prodotnetonaws

-ImageScanningConfiguration_ScanOnPush true

Just as with the CLI, you have the ability to fully configure the repository as you create it.

AWS Toolkit for Visual Studio

As well as using the AWS Toolkit for Visual Studio, although you must depend upon the extension’s built-in default values because the only thing that you can control through the AWS Explorer is the repository name as shown below. As you may notice, the AWS Explorer does not have its own node for ECR and instead puts the Repositories sub-node under the Amazon Elastic Container Service (ECS) node. This is a legacy from the past, before ECR really became its own service, but it is still an effective way to access and work with repositories in Visual Studio.

Once you create a repository in VS, going to the ECR console and reviewing the repository that was created will show you that it used the default settings, so it was a private repository with both “Scan on push” and “KMS encryption” disabled.

At this point, the easiest way to show how this will all work is to create an image and upload it into the repository. We will then be able to use this container image as we go through the various AWS container management services.

Note: You will not be able to complete many of these exercises with Docker already running on your machine. You will find download and installation instructions for Docker Desktop at https://www.docker.com/products/docker-desktop. Once you have Desktop installed you will be able to locally build and run container images.

We will start by creating a simple .NET ASP.NET Core sample web application in Visual Studio through File -> New Project and selecting the ASP.NET Core Web App (C#) project template. You then name your project and select where to store the source code. Once that is completed you will get a new screen that asks for additional information as shown below. The checkbox for Enable Docker defaults as unchecked, so make sure you check it and then select the Docker OS to use, which in this case is Linux.

This will create a simple solution that includes a Docker file as shown below:

If you look at the contents of that generated Docker file you will see that it is very similar to the Docker file that we went through earlier, containing the instructions to restore and build the application, publish the application, and then copy the published application bits to the final image, setting the ENTRYPOINT.

FROM mcr.microsoft.com/dotnet/aspnet:5.0 AS base
WORKDIR /app
EXPOSE 80
EXPOSE 443

FROM mcr.microsoft.com/dotnet/sdk:5.0 AS build
WORKDIR /src
COPY ["SampleContainer/SampleContainer.csproj", "SampleContainer/"]
RUN dotnet restore "SampleContainer/SampleContainer.csproj"
COPY . .
WORKDIR "/src/SampleContainer"
RUN dotnet build "SampleContainer.csproj" -c Release -o /app/build

FROM build AS publish
RUN dotnet publish "SampleContainer.csproj" -c Release -o /app/publish

FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
ENTRYPOINT ["dotnet", "SampleContainer.dll"]

If you look at your build options in Visual Studio as shown below, you will see additional ones available for containers. The Docker choice, for example, will work through the Docker file and start the final container image within Docker and then connect the debugger to that container so that you can debug as usual.

The next step is to create the container image and persist it into the repository. To do so, right click the project name in the Visual Studio Solution Explorer and select Publish Container to AWS to bring up the Publish Container to AWS wizard as shown below.

The image above shows that the repository that we just created is selected as the repository for saving, and the Publish only the Docker image to Amazon Elastic Container Registry option in the Deployment Target was selected (these are not the default values for each of these options). Once you have this configured, click the Publish button. You’ll see the window in the wizard grind through a lot of processing, then a console window may pop up to show you the actual upload of the image, and then the wizard window will automatically close if successful.

Logging into Amazon ECR and going into the prodotnetonaws repository (the one we uploaded the image into), as shown below, will demonstrate that there is now an image available within the repository with a latest tag, just as configured in the wizard. You can click the icon with the Copy URI text to get the URL that you will use when working with this image. We recommend that you go ahead and do this at this point and paste it somewhere easy to find as that is the value you will use to access the image.

Now that you have a container image stored in the repository, we will look at how you can use it in the next article