AWS Schema Conversion Tool

The AWS Schema Conversion Tool is designed to make cross-engine database migrations more predictable. It does this by automatically converting not only the source data, but also most of the database objects such as views, stored procedures, and functions. If you think back to the previous section, you may recall that there was no mention of those database objects; the objective was simply to move all of the database tables. And, since that is all our sample database had, that was quite sufficient. However, many “enterprisey” databases will have these database objects. The Schema Conversion Tool will help you with those.

Firstly, the schema conversion tool is a downloadable tool, available for use on Microsoft Windows, Fedora Linux, and Ubuntu Linux. You can access the download links at https://aws.amazon.com/dms/schema-conversion-tool. We will use the Windows version of the tool for our walkthrough. Second, the tool will only migrate relational data into Amazon RDS or Amazon Redshift. Table 1 displays the source and target database combinations supported by the tool.

  SourceAurora MySQLAurora PGSQL  MariaDB  MySQL  PGSQLSQL Server  Redshift
OracleXXXXX  
Oracle Data Warehouse      X
Azure SQL DatabaseXX XX  
Microsoft SQL ServerXXXXXXX
Teradata      X
IBM Netezza      X
Greenplum      X
HPE Vertica      X
MySQL X  XX 
PostgreSQL (PGSQL)XX XX  
IBM DB2 LUWXXXXX  
IBM Db2 for z/OSXX XX  
SAP ASEXXXXX  
Amazon Redshift      X
Azure Synapse Analytics      X
Snowflake      X
Table 1 Databases available as sources and targets for Schema Conversion Tool

Clicking the download tool link will start the downloading of a zip file. Once the file is downloaded, extract the content to a working directory. There will be a .msi installation file and two folders. Run the installation file and start the application when the installation is completed.

Configuring the Source

Upon your first run of the tool, you will be presented with the terms of use. Accepting these terms will open the application and present the Create a new database migration project screen as shown in Figure 1.

Figure 1. Create a new database migration project screen in SCT
Figure 1. Create a new database migration project screen in SCT

 We selected Microsoft SQL Server as our source engine and this enabled the three radio buttons that give some direction as to how the conversion process should proceed. The three choices are:

·         I want to switch engines and optimize for the cloud (default)

·         I want to keep the same engine but optimize for the cloud

·         I want to see a combined report for database engine switch and optimization to cloud.

Each of these selections will alter the logic of the migration project, for example, selecting to keep the same engine will provide you with a different set of destinations than selecting to switch engines.

Completing the fields in Step 1 and clicking the Next button will take you to the Step 2 – Connect to the source database screen as shown in Figure 2.

Figure 2. Specifying connection information for the source database
Figure 2. Specifying connection information for the source database

As shown in Figure 2 there are four fields that have the upper left corner of the field displaying a red tick. Those fields are required for the connection, and three of them are fields that you should be well acquainted with by now, the Server Name, User name, and Password (when accessing the source database using SQL Server Authentication). However, the last field, Microsoft SQL Server driver path, is a new one and points to the directory in which the Microsoft SQL Server JDBC driver is located, which we didn’t have installed. Fortunately, AWS helpfully provides a page with links to the various database drivers at https://docs.aws.amazon.com/SchemaConversionTool/latest/userguide/CHAP_Installing.html. You will need to install drivers for both your source and target databases. We went through and downloaded the drivers for SQL Server (our source database) and the drivers for Amazon Aurora MySQL (our destination database). Once the appropriate JDBC drivers are installed, you can point to the SQL Server driver path as shown in Figure 3.

Figure 3. Specifying SQL Server JDBC path
Figure 3. Specifying SQL Server JDBC path

Once you have your server, authentication, and driver path filled out you can click the Test connection button to ensure everything works as expected. If that is successful, you can select Next to continue.

Note: Our Microsoft SQL Server JDBC download contained three different .jar files, jre8, jre11, and jre17. The tool would allow the selection of jre8 and jre11 but would not allow the selection of the jre17 file. This will likely change as the tool continues to evolve.

The tool will next display a screen that indicates that it is loading the metadata for the server. This metadata includes databases, schemas, and triggers. Once that loading is completed, you will be in Step 3. Choose a schema where you will get a list of all databases and the schemas available within each one. This list includes all of the system databases, such as master, model, msdb, and tempdb. You will probably not want to include those! Once you have selected the schema(s), click the Next button. You will see the “Loading metadata” screen again as the tool gets all the database objects based upon your selected schema(s). This process will take a few minutes.

Database Migration Assessment Screen

Once completed, you will be taken to the Step 4. Run the Database migration assessment screen. The first thing that you will see is the assessment report. This report was created by the tool taking all the metadata that it found and analyzing it to see how well it would convert into the various source databases. At the top of the report is the Executive summary. This lists all of the potential target platforms and summarizes the types of actions that need to be taken. An example of this report is shown in Figure 4.

Figure 4. Executive summary of the migration assessment report
Figure 4. Executive summary of the migration assessment report

Immediately under the executive summary is a textual analysis of the data in the chart. Each of the line items is described with an estimation of the percentage of database storage objects and database code objects that can be automatically converted.  In our case, both Amazon RDS for MySQL and Amazon Aurora (MySQL compatible) can be converted at 100%. None of the other target platforms scored that high.

Additional detail is displayed further down the page as shown in Figure 5.

Figure 5. Details on migrating to Amazon Aurora MySQL
Figure 5. Details on migrating to Amazon Aurora MySQL

This section demonstrates that 1 schema, 10 tables, 17 constraints, 7 procedures and 1 scalar function can be successfully converted to Amazon Aurora (MySQL compatible).

Configuring the Destination

Once you have completed your review of the potential destination, click the Next button. This will bring you to Step 5. Choose a target page where you select the target engine and configure the connection to the target database. When we got to the page Amazon RDS for MySQL was selected as the target engine, so we went with that and created a new Amazon RDS for MySQL instance in the RDS console, making sure that we enabled external access. Filling out the connection information and clicking the Test connection button demonstrated that we had filled the information out appropriately, so we clicked the Finish button.

Completing the Migration

This brings you to the project page as shown in Figure 6.

Figure 6. Schema conversion tool conversion project
Figure 6. Schema conversion tool conversion project

Just like with DMS, the conversion tool gives you the ability to add mapping and transformation rules. You do this by clicking on the Main view icon in the toolbar and selecting the Mapping view. This changes the center section of the screen. In this section you can add Transformation rules. These transformation rules, just as with DMS, allow you to alter the name of items that are going to be migrated. You can create a rule where you create the appropriate filters to determine which objects will be affected, and you have the following options on how the names will be changed:

·          Add prefix

·         Add suffix

·         Convert lowercase

·         Convert uppercase

·         Move to

·         Remove prefix

·         Remove suffix

·         Rename

·         Replace prefix

·         Replace suffix

These different transformations are useful when working with database schemas that user older approaches such as using a prefix of “t” before the name to show that the object is a table, or “v” to indicate that it’s a view. We will not be using any transformations as part of our conversion.

Since we are converting our ProDotNetOnAWS database and its dbo schema, you need to go to the left window where the SQL Server content is displayed, right-click on the dbo schema, and select Convert schema from the popup menu. You will get an additional popup that shows the copying of the schema to the source destination. Once completed, the right window will look like Figure 7 where it shows that the schema has been copied over along with tables, procedures, views, and functions (if you have all of those).

Figure 7. Schema converted to source database
Figure 7. Schema converted to source database

Note that this has not yet been applied to the destination server and is instead a local representation of what that it would look like once applied. Your next step is to apply the changes to the destination. You do this by right-clicking on the destination schema and selecting Apply to database. This will bring up a pop-up confirmation window after which you will see the schema being processed. The window will close once completed.

At this point, your schema has been transferred to the source database. Figure 8 shows the destination database in MySQL Workbench, and you can see that the schema defined in the tool has been successfully migrated.

Figure 8. Viewing converted schema in MySQL Workbench
Figure 8. Viewing converted schema in MySQL Workbench

Once your data has been migrated, the last step is to convert your code so that it can access your new database.

AWS Database Migration Service (Part 2 of 2)

In the previous article we went over the high-level approach around using AWS DMS and then created the replication instance on which your migration processing will run and then created the source and target endpoints that manage the connection to the source and target databases. The last step is to create the database migration task.

Creating your Database Migration Task

So far, we have defined the resource set that is going to do the work as well as the places where data will be coming from and where it will be going. There is one area that we have not yet defined, and that is the database migration task. This task defines the work that will be done. As part of this task, you can specify which tables to use, define any special processing, configure logging, etc. Let’s take a look at creating one of these tasks.

First, go into the Database migration tasks screen in the AWS DMS console and then click the Create task button. This will bring up the creation screen, with the first section being Task configuration. This section allows you to:

·         Provide a Task identifier or name for the task

·         Select the Replication instance to use

·         Select the Source database endpoint

·         Select the target database endpoint

·         Select the Migration type

The Migration type is where you tell DMS the kind of work that you want this task to perform. There are three different options that you can select. The first is to Migrate existing data. Using this as a migration type means that you’re looking to do a one-time copy of the data and would be ideal for doing that one-time migration. The next option is to Migrate existing data and replicate ongoing changes. The name pretty much describes what is going on with this approach, and is most appropriate when you need to run both the source and target systems in parallel but want them to stay as updated as possible. This approach is especially common in Data Lake scenarios where data is being moved from a transactional system to an analytics or reporting system. The last migration type option is to Replicate data changes only where you replicate any changes in data but do not perform that one-time migration.

The next major section to complete when creating a migration task is the Task settings. Task settings control the behavior of your task and can be configured through a Wizard or through a JSON editor. We will use the wizard mode so that we can more easily talk about the major settings.

The first item to configure is the Target table preparation mode, or how DMS should be preparing the tables at the target endpoint. There are three options, Do nothing, Drop tables on target, and Truncate. When you select the “do nothing” option then target tables will not be affected. Any tables that do not exist will be created. When you select to drop the tables, then DMS will drop and recreate all affected tables. Truncating means that all tables and metadata remain, but all of the data is removed.

The next item to configure is Include LOB columns in replication. LOB are large objects and you have the option as to whether or not you want to include those object columns in the target data. You have three options, the first of which is Don’t include LOB columns, and the second of which is Full LOB mode; both of which are rather straightforward. The third option is Limited LOB mode. In this mode, DMS will truncate each LOB to a defined size, the Maximum LOB size (kb) value.

You then can configure whether you want to Enable validation. Checking this box will cause DMS to compare the source and target data immediately after the full load is performed. This ensures your data is migrated correctly, but it takes additional time to perform, and thus increases cost. You next can Enable CloudWatch logs. There are also some advanced task settings, but we won’t go into those as part of this discussion.

The next section is Table mappings. This section is where you define the rules about what data is moved and how it is moved. At a high-level, you will create a Selection rule, which determines the data that you wish to replicate, and then you can create a Transformation rule that modifies the selected data before it is provided to the destination endpoint. The table mappings section also gives you the opportunity to use a Wizard approach or a JSON editor to enter all table mappings in JSON. We will walk through using the wizard.

The first step is to select the Add new selection rule button. This expands the selection rule section as shown in Figure 1.

Figure 1. Creating selection rules for a database migration task
Figure 1. Creating selection rules for a database migration task

Expanding the Schema drop-down will show that there is only one option – to Enter a schema. Selecting this option will add another textbox in which you can provide the Source name. This allows you to limit, by schema, the data that is being selected. You can enter % to select all schemas in the database or enter the schema name. You do the same for the Source table name, entering % if you want all the tables replicated. Once you have those defined, you then select the appropriate Action, to either Include or Exclude the items that fit your selection criteria. You can create as many rules as desired, however, you must always have at least one rule with an include action.

Once you have the selection rule configured you can Add column filter. This allows you to limit the number and type of records. A column filter requires the Column name, one or more Conditions, and then one or more comparative values. You have the following options for the conditions:

·         Less than or equal to

·         Greater than or equal to

·         Equal to

·         Not equal to

·         Equal to or between to values

·         Note between two values

·         Null

·         Not null

You can create any number of column filters per each selection rule.

Once you have completed your selection rule you can then add one or more Transformation rules. These rules allow you to change or transform schema, table, or column names of some or all the items that you have selected. Since we are simply copying the database across, we do not need to add any of these, especially since any changes will likely break our code!

Your next option is to determine whether you want to Enable premigration assessment run. This will warn you of any potential migration issues. Checking the box will expand the UI and present you with a set of Assessments to run as shown in Figure 2.

Figure 2. Enabling premigration assessment run on a scheduled task
Figure 2. Enabling premigration assessment run on a scheduled task

Once you have all of your selection and transformation rules created, you can select to Start migration task either Automatically on Create, the default, or Manually later. Lastly, add any tags that you desire and click the Create task button.

This will bring you back to the database migration tasks list screen where you will see your task being created. Once created you can either start the task manually or allow it to run itself if so configured. You will be able to watch the table count move from Tables queued to Tables Loading to Tables loaded as they are processed. Returning to the AWS DMS Dashboard will show that there is 1 Load complete as shown in Figure 3.

Figure 3. Dashboard showing completed migration task
Figure 3. Dashboard showing completed migration task

For those cases where you simply want to migrate data sets with minimal changes other than perhaps renaming some columns, the Database Migration Service works like a dream. Relatively painless to setup and powerful enough to move data between servers, even servers that are of dissimilar types, such as where we just copied data from SQL Server to Amazon Aurora. However, there is a tool that will help you move more disparate data between different database engines. We will take a look at that tool next.

AWS Database Migration Service (Part 1 of 2)

The AWS Database Migration Service (AWS DMS) was designed to help quickly and securely migrate databases into AWS. The premise is that the source database remains available during the migration to help minimize application downtown. AWS DMS supports homogeneous migrations such as SQL Server to SQL Server or Oracle to Oracle as well as some heterogeneous migrations between different platforms. You can also use the service to continuously replicate data from any supported source to any supported target, meaning you can use DMS for both one-time replications as well as ongoing replications. AWS DMS works with relational databases and NoSQL databases as well as other types of data stores. One thing to note, however, is that at least one end of your migration must be on an AWS service, you cannot use AWS DMS to migrate between two on-premises databases.

How Does it Work?

You can best think of DMS as replication software running on a server in the cloud. There are literally dozens of these kinds of tools, some cloud-based, some that you install locally to move data between on-premise systems. The DMS’ claim to fame is that you only pay for the work that you have it perform – there is no licensing fee for the service itself like with most of the other software solutions.

Figure 1 shows DMS at a high level. The green box in Figure 1 is the overall service and contains three major subcomponents. Two of these are endpoints used to connect to the source and target databases, and the third is the replication instance.

Figure 1. A high-level look at AWS Data Migration Service
Figure 1. A high-level look at AWS Data Migration Service

The replication instance is an Amazon EC2 instance that provides the resources necessary to carry out the database migration. Since it is a replication instance, you can get high availability and failover support if you select to use a multi-region-based process.

AWS DMS uses this replication instance to connect to your source database through the source endpoint. The instance then reads the source data and performs any data formatting necessary to make it compatible with the target database. The instance then loads that data into the target database. Much of this processing is done in memory, however large data sets may need to be buffered onto disk as part of the transfer. Logs and other replication-specific data are also written onto the replication instance.

Creating a Replication Instance

Enough about the way that it is put together, let’s jump directly into creating a migration service, and we will go over the various options as they come up in the process.

Note: Not all EC2 instance classes are available for use as a replication instance. As of the time of this writing, only T3 (general purpose), C5 (compute-optimized), and R5 (memory-optimized) Amazon EC2 instance classes can be used. You can use a t3.micro instance under the AWS Free Tier, however, there is a chance that you may be charged if the utilization of the instance over a rolling 24-hour period exceeds the baseline utilization. This will not be a problem in our example, but it may be with other approaches, especially if you use ongoing replication.

You can get to the AWS DMS console by searching for “DMS” or by going into the Migration & Transfer service group and selecting it there. Click the Create replication instance button once you get to the console landing page. This will take you to the creation page. Remember as you go through this that all we are doing here is creating the EC2 instance that DMS will use for processing, so all the questions will be around that.

The fields that you can enter in the Replication instance configuration section are:

·         Name – must be unique across all replication instances in the current region

·         Descriptive Amazon Resource Name (ARN) – This field is optional, but it allows you to use a friendly name for the ARN rather than the typical set of nonsense that AWS creates by default. This value cannot be changed after creation.

·         Description – Short description of the instance

·         Instance class – This is where you select the instance class on which your migration process will be running.

·         Engine version – This option allows the targeting of previous versions of DMS, or the software that runs within the instance class – though we have no idea why you would ever target an older version.

·         Allocated storage – The amount of storage space that you want in your instance. This is where items like log files will be stored and will also be used for disc caching if the instance’s memory is not sufficient to handle all of the processing.

·         VPC – Where the instance should be run.

·         Multi AZ – You can choose between Production workload which will set up multi-AZ or Dev or test workload which will create the instance in a single AZ.

·         Publicly accessible – This is necessary if you are looking to connect to databases outside of your VPC, or even outside of AWS.

There are three additional sections that you can configure. The first of these is Advanced security and network configuration where you can define the specific subnet group for your replication instance, the availability zone in which your replication instance should run, and VPC security groups that you want to be assigned to your replication instance, and the AWS Key Management Service key that you would like used.

The next section is Maintenance, where you can define the weekly maintenance window that AWS will use for maintaining the DMS engine software and operating system. You must have this configured, and AWS will set up a default window for you. The last section that you can configure is, of course, Tags.

Once you click the Create button you will see that your replication instance is being created as shown in Figure 2. This creation process will take several minutes.

Figure 2. Creating a DMS replication instance
Figure 2. Creating a DMS replication instance

Now that you have a replication instance, the next step is to create your endpoints.

Creating your Source and Target Endpoints

As briefly mentioned above, the endpoints manage the connection to your source and target databases. They are managed independently from the replication instance because there are many cases where there are multiple replications that talk to a single source or target, such as copying one set of data to one target and another set of data from the same source to a second target such as shown in Figure 3.

Figure 3. Multiple replications against a single source endpoint
Figure 3. Multiple replications against a single source endpoint

To create an endpoint, go into Endpoints and select Create endpoint. This will bring up the Create endpoint screen. Your first option is to define the Endpoint type, as shown in Figure 4.

Figure 4. Endpoint type options when creating a DMS endpoint
Figure 4. Endpoint type options when creating a DMS endpoint

Your first option when creating the endpoint is to determine whether the endpoint is going to be a source or target endpoint. You would think that this wouldn’t really matter because a database connection is a database connection whether you are reading or writing, but DMS has made decisions around which databases they will support reading from and which databases you can write to, and, as you can likely predict, they are not the same list. Table 1 lists the different databases supported for each endpoint type, as of the time of this writing.

DatabaseAs SourceAs Target
Oracle v10.2 and laterXX
SQL Server 2005 and laterXX
MySQL 5.5 and laterXX
MariaDB 10.0.24 and laterXX
PostgreSQL 9.4 and laterXX
SAP Adaptive Server Enterprise (ASE) 12.5 and aboveXX
IBM DB2 multiple versionsX 
Redis 6.x X
Azure SQL DatabaseX 
Google Cloud for MySQLX 
All RDS instance databasesX 
Amazon S3X 
Amazon DocumentDBX 
Amazon OpenSearch Service X
Amazon ElastiCache for Redis X
Amazon Kinesis Data Streams X
Amazon DynamoDB X
Amazon Neptune X
Apache Kafka X
Table 1. Databases available as sources and targets

The next option in the Endpoint type section is a checkbox to Select RDS DB instance. Checking this box will bring up a dropdown containing a list of RDS instances as shown in Figure 5.

Figure 5. Selecting an RDS database when creating an endpoint
Figure 5. Selecting an RDS database when creating an endpoint

 The next section is the Endpoint configuration. There are two primary sections to this section, the first section allows you to name the endpoint and select the type of database to which you are connecting and the second is Endpoint settings where you can define those additional settings needed to access a specific database. Selecting the Source\Target engine will expand the form, adding some additional fields.

The first of these fields is Access to endpoint database. There are two options available and the choice you make will change the rest of the form. These two options are AWS Secrets Manager, where you use stored secrets for the login credentials, or Provide access information manually where you manually configure the database connection.

Selecting to use AWS Secrets Manager will bring up additional fields as described below. These fields are used to fetch and access the appropriate secret.

·         Secret ID – the actual secret to be used when logging into the database

·         IAM role – the IAM role that grants Amazon DMS the appropriate permissions to use the necessary secret

·         Secure Socket Layer (SSL) mode – whether to use SSL when connecting to the database.

Selecting to Provide access information manually brings up the various fields necessary to connect to that identified engine. Figure 6 shows what this looks like when connecting to a SQL Server, and hopefully, all these values look familiar because we have used them multiple times in earlier articles.

Figure 6. Providing SQL Server information manually for an endpoint
Figure 6. Providing SQL Server information manually for an endpoint

The next section is the Endpoint settings section. The purpose of this section is to add any additional settings that may be necessary for this particular instance of the database to which it is connecting. There are two ways in which you can provide this information. The first is through a Wizard, while the second is through an Editor. When using the Wizard approach, clicking the Add new setting button will bring up a Setting \ Value row, with the Setting being a drop-down list of known settings as shown in Figure 7. These values will be different for each engine as well as whether you are using the endpoint as a source or a target.

Figure 7. Endpoint settings section when creating a SQL Server endpoint
Figure 7. Endpoint settings section when creating a SQL Server endpoint

Selecting to use the Editor approach will bring up a large text box where you can enter the endpoint settings in JSON format. This would likely be the best approach if you need to configure multiple DMS endpoints with the same additional settings.

Once you have Endpoint configuration section complete, the next section is KMS key where you select the appropriate key to be used when encrypting the data that you have input into the configuration. The next section is Tags. The last section entitled Test endpoint connection (optional) is shown in Figure 8 and is where you can test all the information that you have just filled out.

Figure 8. Testing an endpoint configuration
Figure 8. Testing an endpoint configuration

There are two values that you must identify before you can run the test, and that is the VPC and replication instance that you want to use, which is why we had you create the replication instance first! These are necessary because these are the resources that will be used to perform the work of connecting to the database. Once the values are selected, click the Run test button. After a surprising amount of time where you see indications that the test is running, you should get confirmation that your test was successful. This output is shown in Figure 9.

Figure 9. Successful test on an endpoint configuration
Figure 9. Successful test on an endpoint configuration

Obviously, you will need to configure at least one source endpoint and one target endpoint before you can run DMS end to end. However, you also need to make sure that you have each of them configured before you can configure the database migration task. We’ll finish that up in the next article!

Amazon RDS Oracle for .NET Developers

The last database available in RDS that we will go over is the oldest commercial SQL-based database management system, Oracle. While originally strictly relational, Oracle is now considered a multi-model database management system, which means that it can support multiple data models, such as document, graph, relational, and key-value rather than simple supporting relational data like many of the systems we have been talking about up until now. It is also the database of choice for many different packaged software systems and is generally believed to have the largest RDBMS market share (based on revenue) – which means that it would not be surprising to be a .NET developer and yet be working with Oracle. And Amazon RDS makes it easy to do that in the cloud.

Oracle and .NET

Let’s first talk about using Oracle as a .NET developer. Since Oracle is a commercial database system, which is different from the rest of the systems we have talked about in this series, it has a lot of additional tools that are designed to help .NET developers interact with Oracle products. The first of these is the Oracle Developer Tools for Visual Studio.

Oracle Developer Tools for Visual Studio

There are a lot of .NET applications based upon Oracle, which means that it is to Oracle’s advantage to make that interaction as easy as possible. One of the ways that they did this was to create the Oracle Developer Tools for Visual Studio (ODT for VS). This tool runs within Visual Studio 2017 or 2019 (2022 was not supported at the time of this writing) and brings in features designed to provide insight and improve the developer experience. Examples of the features within this tool include:

·         Database browsing – Use Server Explorer to browse your Oracle database schemas and to launch the appropriate designers and wizards to create and alter schema objects.

·         Schema comparison – View differences between two different schemas and generate a script that can modify the target schema to match the source schema. You can do this by connecting to live databases or by using scripts within an Oracle Database project.

·         Entity Framework support – Use Visual Studio’s Entity Designer for Database First and Model First object-relational mapping. (“Code First” is also supported).

·         Automatic code generation– You can use various windows, designers, and wizards to drag and drop and automatically generate .NET code.

·         PL/SQL Editor and debugger– Allows you to take advantage of Visual Studio’s debugging features from within PL/SQL code, including seamlessly stepping from .NET code into your PL/SQL code and back out again.

You need to have a free Oracle account before you can download the tools from https://www.oracle.com/database/technologies/net-downloads.html. Please note that installing these tools will also install functionality to interact with Oracle Cloud, but those details are for a different article! Once the tools are downloaded and installed you will see a new section in your Tools menu as shown in Figure 1.

Figure 1. New features added to Tools menu by ODT for VS

You will also find four new project templates added to the Create a new project wizard:

·         Visual C# Oracle CLR project – creates a C#-based project for creating classes to use in Oracle database

·         Visual Basic Oracle CLR project – creates a Visual Basic project for creating classes to use in Oracle database

·         Oracle Database project – creates a project for maintaining a set of scripts that can be generated using Server Explorer menus. This project type does NOT support schema comparison.

·         Oracle Database project Version 2 – creates a project for maintaining a standardized set of SQL scripts that represent your Oracle database schema. This project type supports schema comparison.

There are additional features to these tools, so suffice to say that Oracle provides various ways to help .NET developers interact with their Oracle databases. Lots of ways. Many more than you will find for any of the other databases we have looked at in this series. And it should not surprise you to find that they also support connecting to Oracle databases from within your .NET application.

Oracle Data Provider for .NET (ODP.NET)

Where the ODT for VS is designed to help improve a developer’s productivity when interacting with Oracle databases, ODP.NET instead manages the interconnectivity between .NET applications and Oracle databases. ODP.NET does that by providing several NuGet packages, Oracle.ManagedDataAccess.Core and Oracle.EntityFrameworkCore, that support .NET 5 and more recent versions and several NuGet packages supporting .NET versions prior to 5.0, Oracle.ManagedDataAcess and Oracle.ManagedDataAccess.EntityFramework. Once you have the packages, the next thing that you need to do is to configure your application to use Oracle. You do this by using the UseOracle method when overriding the OnConfiguring method in the context class as shown below:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseOracle("connection string here");
}

A connection string for Oracle has three required fields:

·         User Id – username for use with the connection

·         Password – password

·         Data Source – the Transparent Network Substrate (tns) name is the name of the entry in tnsnames.ora file for the database server. This file can be found in the $ORACLE_HOME/network/admin directory.

This makes it seem like this should be an easy task to manage a connection string. However, of course, there is a caveat – you must be willing to deploy a file that has to be in a very specific place on the server and contain a reference to the server to which you need to connect. If you are okay with that approach then this is a simple connection string – “user id=prodotnetonaws;password=password123;data source=OrcleDB”. However, since a lot of the flexibility inherent in the cloud will go away if you start making this a requirement (you are no longer simply deploying just your application), then you will have to build a much uglier connection string using a Connect Descriptor:

“user id=prodotnetonaws;password=password123;data source=”(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=servernamehere)(PORT=1521))(CONNECT_DATA=(SID=databasename)))”

This means that we will need to build our connection string with additional values:

  • Host – The address of the server to which the application will connect
  • SID – The database, on the host server, to which the application is connecting

Let’s now setup our Oracle database and see where you get those values from.

Setting up an Oracle Database on Amazon RDS

Now that we know how to setup our .NET application to access an Oracle database, let’s go look at setting up an Oracle instance. First, log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then Oracle. This will bring up the remainder of that section as shown in Figure 2

Figure 2. Options after selecting Oracle when creating a new RDS Database

As you can see, your next option is to select the Database management type, for which there are two options, the default Amazon RDS and Amazon RDS Custom. The Amazon RDS Custom management type requires you to upload your own installation files and patches to Amazon S3. Selecting that management type will change the UI as shown in Figure 3.

Figure 3. Selecting Amazon RDS Customs management type

In Amazon RDS Custom, a custom engine version (CEV) is a binary volume snapshot of a database engine and specific AMI. You first upload installation files and patches to Amazon S3 from which you create CEVs. These CEVs are used as the resources for your database. While this gives you much more control over the resources used by your database as well as managing the extra options you may have purchased as add-ons, it is out of scope for this article, so select Amazon RDS instead!

The next configuration option is a checkbox to Use multitenant architecture. This is a very interesting Oracle feature that allows for the concept of a container database (CDB) that contains one or more pluggable databases (PDB). A PDB is a set of schemas, objects, and related structures that appear logically to a client application as a separate, fully functional database. RDS for Oracle currently supports only 1 PDB for each CDB.

The next configuration option is the database Edition, with Oracle Enterprise Edition and Oracle Standard Edition Two as the only available choices currently. When selecting the Enterprise edition, you will see that you must bring your own license, however, selecting the Standard edition will allow you to bring your own license or to choose a license-included version. Standard edition is significantly less expensive, so you should consider that approach unless you need the full enterprise functionality. We chose the standard edition, license-included, most-recent version.

Once you have gone through those, all the remaining sections are ones that you have seen before as they are the same as are available on MySQL, MariaDB, and PostgreSQL (there is no serverless instance approach like was available with Amazon Aurora). However, this will not enable us to be able to automatically connect with our .NET application.

If we look back at our Oracle connection string:

“user id=prodotnetonaws;password=password123;data source=”(DESCRIPTION=(ADDRESS=(PROTOCOL=tcp)(HOST=servernamehere)(PORT=1521))(CONNECT_DATA=(SID=databasename)))”

There are two values that are needed, the servername and the databasename. We know that once the server has been created that there will be a servername, or host, but there is not yet a database with which to connect. Remember, this work you are doing right now is not to create the Oracle database, it is instead around getting the Oracle server set up and available. You can create an initial database by expanding the Additional Configuration section and filling out the Initial database name field in the Database options section as shown in Figure 4.

Figure 4. Creating an initial database during setup

Add in an initial database name and complete the set-up. Once you click the Create button then the process will start. However, since Oracle is a much more complicated server than any of the others, this initial creation and setup process will be considerably longer than it was with the other databases.

Once your database is available, clicking on the DB identifier will bring up the database details. This is where you will be able to see the endpoint of the server. Using that value plus the database name that you created during the setup process will finish the process for updating your application to use Oracle as its primary database.

Amazon RDS – Aurora for .NET Developers 

Amazon Aurora is a MySQL and PostgreSQL-compatible relational database designed for the cloud. AWS claims that with some workloads Aurora can deliver up to 5x the throughput of MySQL and up to 3x the throughput of PostgreSQL without requiring any application changes. Aurora can do this because its storage subsystem was specifically designed to run on AWS’ fast distributed storage; in other words, Aurora was designed with cloud resources in mind, while those other “non-cloud only” databases are simply running on cloud resources. This design approach allows for automatic storage growth as needed, up to a cluster volume maximum size of 128 tebibytes (TiB) and offers 99.99% availability by replicating six copies of your data across three Availability Zones and backing up your data continuously to Amazon S3. It transparently recovers from physical storage failures; instance failover typically takes less than 30 seconds.

Note: A tebibyte (TiB) is a unit of measure used to describe computing capacity. The prefix tebi comes from the power-of-2 (binary) system for measuring data capacity. That system is based on powers of two. A terabyte (the unit normally seen on disk drives and RAM) is a power-of-10 multiplier, a “simpler” way of looking at the value. Thus, one terabyte = 1012 bytes, or 1,000,000,000,000 bytes as opposed to one tebibyte, which equals 240 bytes, or 1,099,511,627,776 bytes

Also, because of this customized design, Aurora can automate and standardize database replication and clustering. The last uniquely Aurora feature is the ability to use push-button migration tools to convert any already-existing RDS for MySQL and RDS for PostgreSQL applications to use RDS for Aurora instead. The argument for this ease in migration, and for Amazon Aurora in general, is that even though Aurora may be 20% more expensive than MySQL, Amazon claims that Aurora is 5x faster than MySQL, has 3x the throughput of standard PostgreSQL, and is able to scale to much larger datasets.

Creating an Amazon Aurora database in RDS

Let’s next look at creating a new Aurora database. First, log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then Aurora.  This should bring up some Aurora-specific sections as shown in Figure 1.

Figure 1. Selecting edition and capacity type when building an Aurora database

The first selection, Edition, asks you to determine whether you wish a MySQL or PostgreSQL compatible edition.

MySQL compatible edition

The default selection when creating an Aurora database is MySQL, as shown above in Figure 1. By making this choice, values will be optimized for MySQL and default filters will be so set for the options within the Available versions dropdown. The next area, Capacity type, provides two choices: Provisioned and Serverless. Selecting a provisioned capacity type will require you to select the number and instance classes that you will need to manage your workload as well as determine your preferred Availability & durability settings as shown in Figure 2.

Figure 2. Settings for creating a provisioned database

Selecting the serverless capacity type, on the other hand, simply requires you to select a minimum and maximum value for capacity units as shown in Figure 3. A capacity unit is comparable to a specific compute and memory configuration. Based on the minimum capacity unit setting, Aurora creates scaling rules for thresholds for CPU utilization, connections, and available memory. Aurora then reduces the resources for the DB cluster when its workload is below these thresholds, all the way down to the minimum capacity unit.

Figure 3. Capacity settings when creating a serverless database

You also have the ability to configure additional aspects around scaling using the Additional scaling configuration options. The first value is Autoscaling timeout and action. Aurora looks for a scaling point before changing capacity during the autoscaling process. A scaling point is a point in time when no transactions or long-running queries are in process. By default, if Aurora can’t find a scaling point within the specified timeout period, it will stop looking and keep the current capacity. You will need to choose the Force the capacity change option to make the change even without a scaling point. Choosing this option can affect any in-process transactions and queries. The last selection is whether you want the database to Scale the capacity to 0 ACUs when cluster is idle. The name of the option pretty much tells the story; when that item is selected then your database will basically shut off when not being used. It will then scale back up as requests are generated. There will be a performance impact on that first call, however, you will also not be charged any processing fees.

The rest of the configuration sections on this page are the same as they have been for the previous RDS database engines that we posted about earlier.

PostgreSQL compatible edition

Selecting to create a PostgreSQL-compatible Aurora database will give you very similar options as you would get when selecting MySQL. You have the option to select either a Provisioned or Serverless capacity type, however, when selecting the serverless capacity type you will see that the default values are higher. While the 1 ACU setting is not available, the ability to scale to 0 capacity units when the cluster is idle is still supported.

There is one additional option that is available when creating a provisioned system, Babelfish settings. Aurora’s approach towards building compatibility with the largest OSS relational database systems has proven to be successful for those using those systems. AWS took the first step into building compatibility with commercial software by releasing Babelfish for Aurora PostgreSQL. As briefly touched on earlier, Babelfish for Aurora PostgreSQL is a new capability that enables Aurora to understand commands from applications written for Microsoft SQL Server as shown in Figure 4. 

Figure 4. Accessing Amazon Aurora through Babelfish

With Babelfish, Aurora PostgreSQL now “understands” T-SQL and supports the SQL Server communications protocol, so your .NET apps that were originally written for SQL Server will work with Aurora – hopefully with minimal code changes. Babelfish is a built-in capability of Amazon Aurora and has no additional cost, although it does require that you be using a version greater than PostgreSQL 13.4, which at the time of this writing was not available on Serverless and is why this option is unable to be selected from that mode.

Amazon Aurora and .NET

As briefly touched on earlier, the primary outcome of your making a choice between PostgreSQL and MySQL is that the choice determines how you will interact with the database. This means that using the MySQL-compatible version of Aurora requires the use of the MySql.EntityFrameworkCore NuGet packages, while connecting to the PostgreSQL-compatible edition requires the Npgsql and Npgsql.EntityFrameworkCore.PostgreSQL packages, just like they were used earlier in those sections of this series. If you are considering using Babelfish with the PostgreSQL-compatible, then you would use the standard SQL Server NuGet packages as we worked with in the last few posts.

This means that moving from MySQL on-premises to MySQL-compatible Aurora Serverless would require no code changes to systems accessing the database; the only change you would have to manage would be the connection string so that you can ensure that you are talking to the database. Same for PostgreSQL and even SQL Server. This approach for compatibility has made it much easier to move from well-known database systems to Amazon’s cloud-native database, Aurora.

Amazon RDS – PostgreSQL for .NET Developers

PostgreSQL is a free, open-source database that emphasizes extensibility and SQL compliance and was first released in 1996. A true competitor to commercial databases such as SQL Server and Oracle, PostgreSQL supports both online transaction processing (OLTP) and online analytical processing (OLAP) and has one of the most advanced performance features available, multi-version concurrency control (MVCC). MVCC supports the simultaneous processing of multiple transactions with almost no deadlock, so transaction-heavy applications and systems will most likely benefit from using PostgreSQL over SQL Server, and there are companies that use PostgreSQL to manage petabytes of data.

Another feature that makes PostgreSQL attractive is that not only does it support your traditional relational database approach, but it also fully supports a JSON/JSONB key/value storage approach that makes it a valid alternative to your more traditional NoSQL databases; so, you can now use a single product to support the two most common data access approaches. Because of its enterprise-level of features and the amount of work it takes to manage and maintain those, even though it is also open source and free software like MySQL and MariaDB, it is slightly more expensive to run PostgreSQL on Amazon RDS than those other open-source products.

PostgreSQL and .NET

As with any database products that you will access from your .NET application, its level of support for .NET is important. Fortunately for us, there is a large community involved in helping ensure that PostgreSQL is relevant to .NET users.

Let’s look at what you need to do to get .NET and PostgreSQL working together. The first thing you need to do is to include the necessary NuGet packages, Npgsql and Npgsql.EntityFrameworkCore.PostgreSQL as shown in Figure 1.

Figure 1. NuGet packages required to connect to PostgreSQL

Once you have the packages, the next thing that you need to do is to configure your application to use PostgreSQL. You do this by using the UseNpgsql method when overriding the OnConfiguring method in the context class as shown below:

protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
    optionsBuilder.UseNpgsql("connection string here");
}

A connection string for PostgreSQL has six required fields:

  • server – server with which to connect
  • port – port number on which PostgreSQL is listening
  • user id – user name
  • password – password
  • database – database with which to connect
  • pooling – whether to use connection pooling (true or false)

When working in an ASP.NET Core application the connection string is added to the appsettings.json file as shown in Figure 2.

Figure 2. Adding a connection string to an ASP.NET Core application

Let’s now go create a PostgreSQL database.

Setting up a PostgreSQL Database on Amazon RDS

Now that we know how to set up our .NET application to access PostgreSQL, let’s go look at setting up a PostgreSQL instance. First, log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then PostgreSQL. You then have a lot of different versions that you can select from, however, the NuGet packages that we used in our earlier example require a reasonably modern version of PostgreSQL, so unless you have any specific reason to use an older version you should always use the default, most updated version.

Once you have defined the version of PostgreSQL that you will use, your next option is to select the Template that you would like to use. Note that you only have two different templates to choose from:

·         Production – defaults are set to support high availability and fast, consistent performance.

·         Dev/Test – defaults are set in the middle of the range.

Note: Both MySQL and MariaDB had a third template, Free tier, that is not available when creating a PostgreSQL database. That does not mean that you must automatically pay, however, as the AWS Free Tier for Amazon RDS offer provides free use of Single-AZ Micro DB instances running PostgreSQL. It is important to consider that the free usage tier is capped at 750 instance hours per month across all your RDS databases.

Selecting the template sets defaults across the rest of the setup screen and we will call those values out as we go through those items.

Once you select a template, your next setup area is Availability and durability. There are three options to choose from:

·         Multi-AZ DB cluster – As of the time of writing, this option is in preview. Selecting this option creates a DB cluster with a primary DB instance and two readable standby instances, with each instance in a different Availability Zone (AZ). Provides high availability, data redundancy and increases capacity to serve read workloads.

·         Multi-AZ DB instance – This option creates a primary DB instance and a standby DB instance in a different AZ. Provides high availability and data redundancy, but the standby instance doesn’t support connections for read workloads. This is the default value if you chose the Production template.

·         Single DB instance– This option creates a single DB instance with no standby instances. This is the default value if you chose the Dev/Test template.

The next section, Settings, is where you provide the DB instance identifier, or database name, and your Master username and Master password. Your database identifier value must be unique across all the database instances you have in the current region, regardless of engine option. You also have the option of having AWS auto-generate a password for you.

The next section allows you to select the DB instance class. You have the same three filters that you had before of Standard classes, Memory optimized classes, and Burstable classes. Selecting one of the filters changes the values in the instance drop-down box, You need to select Burstable classes and then one of the instances with micro in the title, such as a db.t3.micro as shown in Figure 3.

Figure 3. Selecting a free-tier compatible DB instance

The next section in the setup is the Storage section, with the same options that you had available when going through the MySQL and MariaDB setups, though the default values may be different based upon the instance class that you selected. After the storage section are the Connectivity and Database authentication sections that we walked through earlier, so we will not go through them again now – they are standard across all RDS engine options. Selecting the Create database button will take you back to the RDS Databases screen where you will get a notification that the database is being created as well as a button that you can click to access the connection details. Make sure you get the password if you selected for AWS to create your administrative password. You will only be able to access the password this one time.

The pricing for PostgreSQL is slightly higher than MariaDB or MySQL when looking at compatible configurations, about 6% higher.

Selecting between PostgreSQL and MySQL/MariaDB

There are some significant differences between PostgreSQL and MySQL\MariaDB that can become meaningful when building your .NET application. Some of the more important differences are listed below. There are quite a few management and configuration differences, but those are not mentioned since RDS manages all of those for you!

·         Multi-Version Concurrency Control – PostgreSQL was the first DBMS to rollout multi-version concurrency control (MVCC), which means reading data never blocks writing data, and vice versa. If your database is heavily used for both reading and writing than this may be a significant influencer.

·         More types supported – PostgreSQL natively supports NoSQL as well as a rich set of data types including Numeric Types, Boolean, Network Address, Bit String Types, and Arrays. It also supports JSON, hstore (a list of comma-separated key/value pairs), and XML, and users can even add new types.

·         Sequence support – PostgreSQL supports multiple tables taking their ids from the same sequence while MySQL/MariaDB do not.

·         Index flexibility – PostgreSQL can use functions and conditional indexes, which makes PostgreSQL database tuning very flexible, such as not having a problem if primary key values aren’t inserted sequentially.

·         Spatial capability – PostgreSQL has much richer support for spatial data management, quantity measurement, and geometric topology analysis.

While PostgreSQL is considered one of the most advanced databases around, that doesn’t mean that it should automatically be your choice. Many of the advantages listed above can be considered advanced functionality that you may not need. If you simply need a place to store rarely changing data, then MySQL\MariaDB may still be a better choice. Why? Because it is less expensive and performs better than PostgreSQL when performing simple reads with simple join. As always, keep your use cases in mind when selecting your database.

Note: AWS contributes to an open-source project called Babelfish for PostgreSQL, which is designed to provide the capability for PostgreSQL to understand queries from applications written for Microsoft SQL Server. Babelfish understands the SQL Server wire-protocol and T-SQL. This understanding means that you can use SQL Server drivers for .NET to talk to PostgreSQL databases. As of this writing, this functionality is not yet available in the PostgreSQL version of RDS. It is, however, available for Aurora PostgreSQL. We will go over this in more detail later in the chapter. The project can be seen at https://www.babelfishpg.org.

MariaDB, MySQL, and PostgreSQL are all open-source databases that have existed for years and that you can use anywhere, including that old server under your desk. The next database we will talk about is only available in the cloud and within RDS, Amazon Aurora.

Amazon RDS – MariaDB for .NET Developers

MariaDB is a community-developed, commercially supported fork of MySQL that is intended to remain free and open-source software under the GNU General Public License (the same license that MySQL started under). As just mentioned, it was forked because of MySQL’s acquisition by Oracle where many of the initial MySQL developers were afraid that because of how MySQL competed against the Oracle database, progress would be slowed or stopped on MySQL. MariaDB’s API and protocol are compatible with those used by MySQL, plus some features to support native non-blocking operations and progress reporting. This means that all connectors, libraries, and applications which work with MySQL should also work on MariaDB. However, for recent MySQL features, MariaDB either has no equivalent yet, such as geography, or deliberately chose not to be 100% compatible. This list of incompatibilities will likely continue to grow with each version.

MariaDB and .NET

Using .NET with MariaDB is easy to configure because of how similar the APIs are for MariaDB and for MySQL. To be honest, they are so identical that the easiest way to consume MariaDB in a .NET application is to use the same MySQL NuGet package and connection approach as we went over in the last post. The MariaDB team does not really spend any time building connectors, and instead works to ensure that the connectors that are out there, such as those built by the MySQL team, are compatible.

Setting up a MariaDB Database on Amazon RDS

Now that we know how to setup our .NET application to access MariaDB, let’s go look at setting up MariaDB in Amazon RDS. Log into the console, go to RDS, select Create database. On the Create Database screen, select Standard create and then MariaDB. You will have a list of versions, starting with version 10.2 at the time of this writing up through the most recent release.

The rest of the set-up screen, surprisingly enough, will look eerily familiar if you just went through the MySQL setup steps; mainly because they are identical! You will have the same three options for the Template that you would like to use (Production, Dev/Test, and Free tier) as well as all of the configuration sections that follow.

Since we took the Free tier route with MySQL, let’s mix it up a little bit and go with the Dev/Test version for MariaDB and we can talk about some of the areas that we glossed over when creating the MySQL database.

The first of these is after you create the database instance identifier and have provided the master user information and is entitled DB instance class. There are three options available for instances:

·         Standard classes (includes m classes) – provide a balance of compute, memory, and network resources and is the best all-around choice for many different database workloads.

·         Memory-optimized classes (includes r and x classes) – have large memory allocations to support those database workloads that process large data sets in memory.

·         Burstable classes (includes t classes) – are the only option available for the free tier and are designed to provide a baseline CPU performance with the ability to burst above this baseline as needed.

Selecting one of these options changes the instances that are available in the instance drop-down from which you make your selection. Selecting the standard classes as shown in Figure 1 will present a drop-down of the m-class instances.

Figure 1. DB Instance class selection for MariaDB (and MySQL)

Selecting one of the other options will filter the list in the drop-down to the applicable classes.

Caution: The lowest m instance class, db.m5.large, with 2vCPUs, 8 GB RAM, and 4,750 Mbps network connectivity will run you $124.83 a month in US East 2, so even a momentary creation has the chance to cost you! The t instance classes are the ones that include the free tier versions.

The next section in the setup is the storage section, with the same options that you had when going through the MySQL steps, though the default values may be different based upon the instance class that you selected. After the storage section is the second “greyed out” area that we saw when we walked through setting up MySQL, Availability & durability.

One of the best features of RDS is how it makes the installation and configuration of a new RDBMS painless when you think about what you would have to do to manage the configuration and maintenance of a standby instance on your own. For those instances where your data needs to be as available as possible, the ability to create (and forget about) a standby instance by checking a radio button can’t be overlooked. Creating a replica will configure a synchronous standby replica in a different Availability Zone than the primary DB instance. In the case of a planned or unplanned outage of the main instance, RDS will automatically failover to the standby. When using a multi-AZ deployment, however, you will be paying approximately twice as much for the duplicated instances as shown in Figure 2.

Figure 2. Estimated monthly costs with standby enabled

Once you have selected the appropriate availability option, in this case we chose to enable a standby instance, the rest of your experience will be the same as it was for MySQL, setting up Database authentication and Additional configuration. You can keep the defaults in these sections and go ahead and create your database or change the values as desired to get a better understanding of each area.

With identical pricing between MySQL and MariaDB, and similar APIs and other interactions, you may be wondering what the differences are between the two.

Selecting between MySQL and MariaDB

My recommendation when you are trying to select between MySQL and MariaDB? All other things being equal, go with MariaDB. Why? Primarily because of the advanced capability that MariaDB offers such as its optimization for performance and its ability to work with large data sets. MariaDB has also spent a lot of effort adding query optimizations for queries that use joins, sub-queries, or derived tables; so, its overall performance is better than you will find with MySQL. Lastly, MariaDB provides better monitoring through the introduction of micro-second precision and extended user statistics.

However, there are occasions when MySQL makes more sense than does MariaDB, generally, when you are using some of the features available in MySQL that are not available in MariaDB, such as geographical processing, JSON stored as binary objects rather than text, or MySQL authentication features such as the ability to authenticate to the database via roles or the ability for a user to activate multiple roles at the same time.

The key is that both are available, and both provide support for .NET development in AWS. However, you do not have to limit your choices to just MariaDB or MySQL, as there is another open-source database that is supported in Amazon RDS that is worth a review. And that’ll be the next post!

Amazon DocumentDB and .NET

Amazon DocumentDB is a fully managed, scalable, and highly available document database service that uses a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64 TB per cluster. Amazon DocumentDB reduces database I/O by only persisting write-ahead logs and does not need to write full buffer page syncs, avoiding slow, inefficient, and expensive data replication across network links.

This design allows for the separation of compute and storage. This means that you can scale each of those areas independently. DocumentDB is designed for 99.99% availability and replicates six copies of your data across three availability zones. It also continually monitors cluster instance health and automatically fails over to a read replica in the event of a failure–typically in less than 30 seconds. You can start with a single instance that delivers high durability then, as you grow, can add a second instance for high availability, and easily increase the number of instances you use for read-only. You can scale read capacity to millions of requests per second by scaling out to 15 low latency read replicas that can be scattered across three availability zones

One of the interesting characteristics is that DocumentDB was built with MongoDB compatibility. Why is that important?

MongoDB

MongoDB is a source-available cross-platform document database that was first released in 2009 and is, by far, the most popular document database available. Since it was both the first, and available open-sourced, it tended to be adopted early by companies running on-premises and experimenting with the concept of document stores so there is a huge number of on-premises software systems that rely completely on, or in part on, MongoDB. This means that any movement of systems to the cloud would have to come up with a way to support the software written around MongoDB otherwise this would be a massive impediment in migrating to the cloud.

AWS realized this and released DocumentDB to include compatibility with the MongoDB APIs that were available at the time. At the time of this writing (and also at release), DocumentDB implements the MongoDB 3.6 and 4.0 API responses. However, MongoDB released version 5.0 in July 2021, so there is no feature parity. Also, since there were annual releases for 4.2, 4.4, 4.4.5, and 4.4.6 that AWS did not support it appears that there will not be additional support moving forward.

While current compatibility with MongoDB is not supported, the design of DocumentDB, together with optimizations like advanced query processing, connection pooling, and optimized recovery and rebuild, provide a very performant system. AWS claims that DocumentDB achieves twice the throughput of currently available MongoDB managed services.

Setting up a DocumentDB Database

Now that we have a bit of understanding about DocumentDB, let’s go set one up. Log in to the console, and either search for Document DB or find it using Services > Database > Amazon DocumentDB. Select the Create Cluster button on the dashboard to bring up the Create cluster page as shown in Figure 1.

Figure 1. Create Amazon DocumentDB Cluster screen

First, fill out the Configuration section by adding in a Cluster identifier. This identifier must be unique to DocumentDB clusters within this region. Next, you’ll see a dropdown for the Engine version. This controls the MongoDB API that you will want to use for the connection. This becomes important when selecting and configuring the database drivers that you will be using in your .NET application. We recommend that you use 4.0 as this gives you the ability to support transactions. The next two fields are Instance class and Number of instances. This is where you define the compute and memory capacity of the instance as well as the number of instances that you will have. For the Instance class, the very last selection in the dropdown is db.t3.medium, which is eligible for the free tier. We selected that class. When considering the number of instances, you can have a minimum of 1 instance that acts as both read-write, or you can have more than one instance which would be primary instance plus replicas. We chose two (2) instances so that we can see both primary and replica instances.

You’ll see a profound difference if you compare this screen to the others that we saw when working with RDS as this screen is much simpler and seems to give you much less control over how you set up your cluster. The capability to have more fine control over configuration is available, however; you just need to click the slider button at the lower left labeled Show advanced settings. Doing that will bring up the Network settings, Encryption, Backup, Log exports, Maintenance, Tags, and Deletion protection configuration sections that you will be familiar with from the RDS chapters.

Once you complete the setup and click the Create cluster button you will be returned to the Clusters list screen that will look like Figure 2.

Figure 2. Clusters screen immediately after creating a cluster

As Figure 2 shows, there are initial roles for a Regional cluster and a Replica instance. As the creation process continues, the regional cluster will be created first, and then the first instance in line will change role to become a Primary instance. When creation is completed, there will be one regional cluster, one primary instance, and any number of replica instances. Since we chose two instances when creating the cluster, we show one replica instance being created. When the cluster is fully available you will also have access to the regions in which the clusters are available. At this point, you will have a cluster on which to run your database, but you will not yet have a database to which you can connect. But, as you will see below when we start using DocumentDB in our code, that’s ok.

DocumentDB and .NET

When looking at using this service within your .NET application, you need to consider that like Amazon Aurora, DocumentDB emulates the APIs of other products. This means that you will not be accessing the data through AWS drivers, but instead will be using MongoDB database drivers within your application. It is important to remember, however, that your database is not built using the most recent version of the MongoDB API, so you must ensure that the drivers that you do use are compatible with the version that you selected during the creation of the cluster, in our case MongoDB 4.0. Luckily, however, MongoDB provides a compatibility matrix at https://docs.mongodb.com/drivers/csharp/.

The necessary NuGet package is called MongoDB.Driver. Installing this package will bring several other packages with it including MongoDB.Driver.Core, MongoDB.Driver.BSON, and MongoDB.Libmongocrypt. One of the first things that you may notice is there is not an “Entity Framework” kind of package that we got used to when working with relational databases. And that makes sense because Entity Framework is basically an Object-Relational Mapper (ORM) that helps manage the relationships between different tables – which is exactly what we do not need when working with NoSQL systems. Instead, your table is simply a collection of things. However, using the MongoDB drivers allow you to still use well-defined classes when mapping results to objects. It is, however, more of a deserialization process than an ORM process.

When working with DocumentDB, you need to build your connection string using the following format: mongodb://[user]:[password]@[hostname]/[database]?[options]

This means you have five components to the connection string:

  • user – user name
  • password – password
  • hostname – url to connect to
  • database – Optional parameter – database with which to connect
  • options – a set of configuration options

It is easy to get the connection string for DocumentDB, however. In the DocumentDB console, clicking on the identifier for the regional cluster will bring you to a summary page. Lower on that page is a Connectivity & security tab that contains examples of approaches for connecting to the cluster. The last one, Connect to this cluster with an application, contains the cluster connection string as shown in Figure 3.

Figure 3. Getting DocumentDB connection string

You should note that the table name is not present in the connection string. You can either add it to the connection string or you can use a different variable as we do in the example below.

Once you have connected to the database and identified the collection that you want to access, you are able to access it using lambdas (the anonymous functions, not AWS’ serverless offering!). Let’s look at how that works. First, we define the model that we want to use for the deserialization process.

public class Person
{
    [BsonId]
    [BsonRepresentation(BsonType.ObjectId)]

    public string Id { get; set; }

    [BsonElement("First")]
    public string FirstName { get; set; }

    [BsonElement("Last")]
    public string LastName { get; set; }
}

What you’ll notice right away is that we are not able to use a plain-ol’ class object (POCO) but instead must provide some MongoDB BSON attributes. Getting access to these will require the following libraries to be added to the using statements:

using MongoDB.Bson;
using MongoDB.Bson.Serialization.Attributes; 

There are many different annotations that you can set, of which we are using three:

·         BsonId – this attribute defines the document’s primary key. This means it will become the easiest and fastest field on which to retrieve a document.

·         BsonRepresentation – this attribute is primarily to support ease-of use. It allows the developer to pass the parameter as type string instead of having to use an ObjectId structure as this attribute handles the conversion from string to ObjectId.

·         BsonElement – this attribute maps the property name “Last” from the collection to the object property “LastName” and the property name “First” to the object property “FirstName”.

That means that the following JavaScript shows a record that would be saved as a Person type.

{
 "Id": "{29A25F7D-C2C1-4D82-9996-03C647646428}",
 "First": "Bill",
 "Last": "Penberthy"
}

We mentioned earlier that accessing the items within DocumentDB becomes straightforward when using lambdas. Thus, a high-level CRUD service could look similar to the code in Listing 1.

using MongoDB.Driver;

public class PersonService
{
    private readonly IMongoCollection<Person> persons;

    public PersonService ()
    {
        var client = new MongoClient("connection string here");
        var database = client.GetDatabase("Production");
        persons = database.GetCollection<Person>("Persons");
    }

    public async Task<List<Person>> GetAsync() => 
        await persons.Find(_ => true).ToListAsync();

    public async Task<Person?> GetByIdAsync(string id) =>
        await persons.Find(x => x.Id == id).FirstOrDefaultAsync();

    public async Task< Person?> GetByLastNamAsync(string name) =>
        await persons.Find(x => x.LastName == name).FirstOrDefaultAsync();

    public async Task CreateAsync(Person person) =>
        await persons.InsertOneAsync(person);

    public async Task UpdateAsync(string id, Person person) =>
        await persons.ReplaceOneAsync(x => x.Id == id, person);

    public async Task RemoveAsync(string id) =>
        await persons.DeleteOneAsync(x => x.Id == id);
}

  Code Listing 1. CRUD service to interact with Amazon DocumentDB

One of the conveniences with working with DocumentDB (and MongoDB for that matter) is that creating the database and the collection is automatic when the first item (or document) is saved. Thus, creating the database and book collection is just as simple as saving your first book. Of course, that means you have to make sure that you define your database and collection correctly, but it also means that you don’t have to worry about your application not connecting if you mistype the database name – you’ll just have a new database!

AWS and In-Memory Databases

In-memory databases are typically used for applications that require “real-time access” to data. In-memory databases do this by storing data directly, wait for it, in memory. In-memory databases attempt to deliver microsecond latency to applications for whom millisecond latency is not enough, hence why it is called real-time access.

In-memory databases are faster than traditional databases because they store data in Random-access Memory (RAM) so can rely on a storage manager that requires a lot fewer CPU instructions as there is no need for disk I/O and they are able to use internal optimization algorithms in the query processor which are simpler and faster than those in traditional databases that need to worry about reading various blocks of data rather than using the direct pointers available to in-memory databases. Figure 1 shows how all these pieces interact.

Internal design of an in-memory database
Figure 1. Internals of an in-memory database

Of course, this reliance on RAM also means that in-memory databases are more volatile than traditional databases because data is lost when there is a loss of power or the memory crashes or gets corrupted. This lack of durability (the “D” in ACID) has been the primary stopper of more common usage with the expense of memory being a close second. However, memory has gotten cheaper, and system architectures have evolved around the use of in-memory databases in a way where the lack of durability is no longer a showstopper, mainly as caches or as a “front-end” to a more traditional database. This allows the best of both worlds, real-time persistence with an in-memory database and durability from the traditional database. Figure 2 shows an example of how this could work.

visualization of using in-memory database as a cache
Figure 2. Using an in-memory and traditional database together

In Figure 2, we see that the application primarily interacts with the in-memory database to take advantage of its responsiveness, minimizing any kind of lag that a user may notice. In this example, a component of the application, called “Cache Manager,” is responsible for ensuring that the in-memory database and the traditional database are in synch. This component will be responsible for populating the cache upon cache startup and then ensuring all changes to the in-memory database are replicated through to the traditional database.

Obviously, the more complex the systems the more difficult this cache management may become – say there is another application that may be changing the traditional database in the same way. This could be problematic because the cache manager in your first application may not know that there is a change and thus data in the in-memory database will become stale because of the changes from the other application. However, ensuring the same approach, to a shared in-memory database can help eliminate this issue as shown in Figure 3.

Sharing an in-memory database between two applications
Figure 3. Two applications managing their changes to a shared cache

More and more systems are relying on in-memory databases, which is why AWS has two different services, Amazon ElastiCache and Amazon MemoryDB for Redis. Before we dig too far into these services, let’s add some context by doing a quick dive into the history of in-memory databases as this can help you understand the approach that AWS took when creating their products.

Memcached, an open-source general-purpose distributed memory caching system, was released in 2003. It is still maintained, and since it was one of the first in-memory databases available, it is heavily used in various systems including YouTube, Twitter, and Wikipedia. Redis is another open-source in-memory database that was released in 2009. It quickly became popular as both a store and a cache and is likely the most used in-memory database in the world. The importance of these two databases will become obvious shortly.

Amazon ElastiCache

Amazon ElastiCache is marketed as a fully managed in-memory caching service. There are two ElastiCache engines that are supported, Redis and Memcached (see, I told you that these would come up again). Since it is fully managed, ElastiCache eliminates a lot of the work necessary in hardware provisioning, software patching, setup, configuration, monitoring, failure recovery, and backups than you would have running your own instance of either Redis or Memcached. Using ElastiCache also adds support for event notifications and monitoring and metrics without you having to do anything other than enabling them in the console.

However, there are a few limitations to Amazon ElastiCache that you need to consider before adoption. The first is that any application using an ElastiCache cluster must be running within the AWS environment; ideally within the same VPC. Another limitation is that you do not have the ability to turn the service off without deleting the cluster itself; it is either on or gone.

Let’s do a quick dive into the two different ElastiCache engines.

Amazon ElastiCache for Redis

There are two options to choose from when using the Redis engine, non-clustered and clustered. The primary difference between the two approaches is the number of write instances supported. In non-clustered mode, you can have a single shard (write instance) with up to five read replica nodes. This means you have one instance to which you write and the ability to read from multiple instances. In a clustered mode, however, you can have up to 500 shards with 1 to 5 read instances for each.

When using non-clustered Redis, you scale by adding additional read nodes so that access to stored data is not impacted. However, this does leave the possibility of write access being bottlenecked. Clustered Redis takes care of that by creating multiple sets of these read/write combinations so that you can scale out both read, by adding additional read nodes per shard (like non-clustering) or by adding additional shards with their own read nodes. As you can probably guess, a clustered Redis is going to be more expensive than a non-clustered Redis.

As a developer, the key thing to remember is that since ElastiCache for Redis is a managed service offering on top of Redis, using it in your .NET application means that you will use the open-source .NET drivers for Redis, StackExchange.Redis. This provides you the ability to interact with objects stored within the database. Code Listing 1 shows a very simple cache manager for inserting and getting items in and out of Redis.

using StackExchange.Redis;
using Newtonsoft.Json;

namespace Redis_Demo
{
    public interface ICacheable
    {
        public string Id { get; set; }
    }

    public class CacheManager
    {
        private IDatabase database;

        public CacheManager()
        {
            var connection = ConnectionMultiplexer.Connect("connection string");
            database = connection.GetDatabase();
        }

        public void Insert<T>(T item) where T : ICacheable
        {
            database.StringSet(item.Id, Serialize(item));
        }

        public T Get<T>(string id) where T : ICacheable
        {
            var value = database.StringGet(id);
            return Deserialize<T>(value);
        }

        private string Serialize(object obj)
        {
            return JsonConvert.SerializeObject(obj);
        }

        private T Deserialize<T>(string obj)
        {
            return JsonConvert.DeserializeObject<T>(obj);
        }
    }
}

Code Listing 1. Saving and retrieving information from Redis

The above code assumes that all items being cached will implement the ICacheable interface which means that there will be an Id property with a type of string.  Since we are using this as our key into the database, we can assume this Id value will likely be a GUID or other guaranteed unique value – otherwise we may get some strange behavior.

Note: There is another very common Redis library that you may see referenced in online samples, ServiceStack.Redis. The main difference is that ServiceStack is a commercially-supported product that has quotas on the number of requests per hour. Going over that quota will require paying for a commercial license.

Now that we have looked at the Redis engine, let’s take a quick look at the Memcached engine.

Amazon ElastiCache for Memcached

Just as with Amazon ElastiCache for Redis, applications using Memcached can use ElastiCache for Memcached with minimal modifications as you simply need information about the hostnames and port numbers of the various ElastiCache nodes that have been deployed.

The smallest building block of a Memcached deployment is a node. A node is a fixed-size chunk of secure, network-attached RAM that runs an instance of Memcached. A node can exist in isolation or in a relationship with other nodes – a cluster. Each node within a cluster must be the same instance type and run the same cache engine. One of the key features of Memcached is that it supports Auto Discovery.

 Auto Discovery is the ability for client applications to identify all of the nodes within a cluster and initiate and maintain connections to those nodes. This means applications don’t have to worry about connecting to individual nodes, instead you simply connect to a configuration endpoint.

Using ElastiCache for Memcached in a .NET application requires you to use the Memcached drivers, EnyimMemcached for older .NET versions or EnyimMemcachedCore when working with .NET Core based applications. Using Memcached in a .NET application is different from many of the other client libraries that you see because it is designed to be used through dependency injection (DI) rather than “newing” up a client like we did for Redis. Ideally, you would be using DI there as well, but we did not take that approach to keep the sample code simpler. We don’t have that option in this case.

The first thing you need to do is register the Memcached client with the container management system. If working with ASP.NET Core, you would do this through the ConfigureServices method within the Startup.cs class and would look like the following snippet.

using System;
using System.Collections.Generic;
using Enyim.Caching.Configuration;

public void ConfigureServices(IServiceCollection services)
{
   …
   services.AddEnyimMemcached(o => 
        o.Servers = new List<Server> { 
            new Server {Address = "end point", Port = 11211} 
        });
   …
}

Using another container management system would require the same approach, with the key being the AddEnyimMemcached method to ensure that a Memcached client is registered. This means that the Memcached version of the CacheManager class that we used with Redis would instead look like Code Listing 2.

using Enyim.Caching;

namespace Memcached_Demo
{
    public class CacheManager
    {
        private IMemcachedClient client;
        private int cacheLength = 900;

        public CacheManager(IMemcachedClient memclient)
        {
            client = memclient;
        }

        public void Insert<T>(T item) where T : ICacheable
        {
            client.Add(item.Id, item, cacheLength);
        }

        public T Get<T>(string id) where T : ICacheable
        {
            T value;
            if (client.TryGet<T>(id, out value))
            {
                return value;
            }
            return default(T);
        }
    }
}

Code Listing 2. Saving and retrieving information from Memcached

The main difference that you will see is that every item being persisted in the cache has a cache length or the number of seconds that something will stay in the cache. Memcached uses a lazy caching mechanism which means that values will only be deleted when requested or when a new entry is saved, and the cache is full. You can turn this off by using 0 as the input value. However, Memcached retains the ability to delete items before their expiration time when memory is not available for a save.

Choosing Between Redis and Memcached

The different ElastiCache engines can solve different needs. Redis, with its support for both clustered and non-clustered implementations, and Memcached provide different levels of support. Table 1 shows some of the different considerations when evaluating the highest version of the product.

Table 1 – Considerations when choosing between Memcached and Redis

 MemcachedRedis (non-clustered)Redis (clustered)
Data typesSimpleComplexComplex
Data partitioningYesNoYes
Modifiable clustersYesYesLimited
Online reshardingNoNoYes
EncryptionNoYesYes
Data tieringNoYesYes
Multi-threadedYesNoNo
Pub/sub capabilitiesNoYesYes
Sorted setsNoYesYes
Backup and restoreNoYesYes
Geospatial indexingNoYesYes

An easy way to look at it is if you use simple models and have a concern for running large nodes with multiple cores and threads, then Memcached is probably your best choice. If you have complex models and want some support in the database for failover, pub\sub, and backup, then you should choose Redis.

Amazon MemoryDB for Redis

Where Amazon ElastiCache is a managed services wrapper around an open-source in-memory database, Amazon MemoryDB for Redis is a Redis-compatible database service, which means that it is not Redis, but is instead a service that accepts many of the same Redis commands as would a Redis database itself. This is very similar to Amazon Aurora which supports several different interaction approaches; MySQL and PostgreSQL. As of the time of this writing, MemoryDB supported the most recent version of Redis, 6.2.

Because MemoryDB is a fully managed service, creating the instance is straightforward. You create your cluster by determining how many shards you would like to support as well as the number of read-replicas per shard. If you remember the ElastiCache for Redis discussion earlier, this implies that the setup defaults to Redis with clustering enabled.

When looking at pricing, at the time of this writing, MemoryDB costs approximately 50% more than the comparable ElastiCache for Redis pricing. What this extra cost buys you is data durability. MemoryDB stores your entire data set in memory and uses a distributed multi-AZ transactional log to provide that data durability, a feature that is unavailable in ElastiCache for Redis. MemoryDB also handles failover to a read replica much better, with failover typically happening in under 10 seconds. However, other than the pricing difference, there is a performance impact to using MemoryDB over ElastiCache because the use of distributed logs has an impact on latency, so reads and writes will take several milliseconds longer for MemoryDB than would the same save in ElastiCache for Redis.

As you can guess, since the Redis APIs are replicated, a developer will work with MemoryDB in the same way as they would ElastiCache for Redis, using the StackExchange.Redis NuGet package. In-Memory databases, whether ElastiCache or MemoryDB, offer extremely fast reads and writes because they cut out a lot of the complexity added by performing I/O to disks and optimize their processing to take advantage of the speed inherent in RAM. However, working with is very similar to working with “regular” document and key/value databases.

WTF are Ledger Databases?

A ledger database is a NoSQL database that provides an immutable, transparent, and cryptographically verifiable transaction log that is owned by a central authority. The key here is that you never actually change the data in the table. What we generally consider an update, which replaces the old content with the new content, is not applicable when working with a ledger database. Instead, an update adds a new version of the record. All previous versions still exist, so your update never overwrites existing data. The cryptographically verified part comes into place because this ensures that the record is immutable.

Ledger databases are generally used to record economic and financial activity within an organization, such as by tracking credits and debits within an account. Other common use cases are generally around workflows, such as tracking the various steps taken on an insurance claim or tracing the movement of an item through a supply-chain network. For those of you that ever needed to implement audit tables in a relational database – you can start to see how managing that all automatically will be of benefit.

Note: A ledger database may seem very similar to Blockchain. However, there is one significant difference and that is that the ledger database is generally a centralized ledger whereas Blockchain is a distributed ledger. There will be times when the organization, such as a bank of financial organization, is not comfortable with a distributed ledger and instead prefer a simpler architecture with the same guarantee of immutable and verifiable data.

Now that we have discussed, at a high level, the features of a ledger database, let’s take a look at AWS’ version, Amazon QLDB.

Amazon QLDB

In traditional, relational database architecture, the approach is to write data into tables as part of a transaction. This transaction is generally stored in a transaction log and includes all the database modifications that were made during the transaction. This allows you to replay the log and those transactions in the event of a system failure or for data replication. However, generally those logs are not immutable and generally aren’t designed to allow easy access to users.

This approach is different with QLDB, as the journal is the central feature of the database. In a practical sense, the journal is structurally similar to the transaction log, however, it takes a write-only approach to storing application data with all writes, inserts, updates, and deletes, being committed to the journal first. QLDB then uses this journal to interpret the current set of your data. It does this by materializing that data in queryable, user-defined tables. These tables also provide access into the history of the data within that table, including revisions and metadata.

While a lot of the functionality seems similar, some of the terminology is different. Table 1 shows the mapping between these terms.

Relational TermQLDB Term
DatabaseLedger
TableTable
IndexIndex
Table rowAmazon Ion Document
ColumnDocument Attribute
SQLPartiQL
Audit LogsJournal
Table 1. Terminology map between RDBMS and QLDB

The key difference is around the difference between the table row and column from a relational database that was replaced with an Amazon Ion Document. Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format that offers interchangeable binary and text representations. The text format is a superset of JSON and is easy to read and work with while the binary representation is efficient for storage and transmission. Ion’s rich type system enables unambiguous semantics for data (e.g., a timestamp value can be encoded using the timestamp type). This support for rich types allows an Ion document to be able to conceptually replace a database row.

Note – The Ion Document format provides a lot of the support lacking in the Amazon Timestream SDK where columns and values are defined separately. This seems to be an unfortunate example of how various service teams seem to never talk to each other nor follow a single standard!!

Creating an Amazon QLDB ledger is very simple. In the AWS console, navigate to Amazon QLDB. Here you will be given the ability to Create ledger. Creating a ledger only requires:

1.      Ledger name – The name of your ledger, needs to be unique by region

2.      Ledger permissions mode – There are 2 choices here, Standard and Allow all. The “Standard” permissions mode is the default and it enables control over the ledger and its tables using IAM. The “Allow all” mode allows any user with access to the ledger to manage all tables, indexes, and data.

3.      Encrypt Data at rest – Where you choose the key to use when encrypting data. There is no ability to opt-out – all data will be encrypted at rest.

4.      Tags – additional label describing your resource

Creating a QLDB Ledger

You can also create a table at this time, however, there is no way through the UI for you to be able to add an Index. So, instead, let’s look at how you can do that in .NET as well as add data to your newly created table.

.NET and Amazon QLDB

The first thing you will need to do to work with Amazon QLDB is to add the appropriate NuGet packages. The first one you should add is Newtonsoft.Json, which we will use in our examples to manage serialization and deserialization. The next ones to add are specific to QLDB, AWSSDK.QLDBSession, Amazon.IonObjectMapper, Amazon.QLDB.Driver, and Amazon.QLDB.Driver.Serialization. If you install the Driver.Serialization package, NuGet will install all the other required QLDB packages as dependencies.

The next step is to build an Amazon.QLDB.Driver.IQldbDriver. You do that with a Builder as shown below.

private IQldbDriver qldbDriver;

AmazonQLDBSessionConfig amazonQldbSessionConfig = new AmazonQLDBSessionConfig();

qldbDriver = QldbDriver.Builder()
    .WithQLDBSessionConfig(amazonQldbSessionConfig)
    .WithLedger(ledgerName)
    .Build();

Note how you can not just “new” up a driver, and instead must use dot notation based upon the Builder() method of the QldbDriver. Next, you need to ensure that the table and applicable indexes are set up. See the code below that contains a constructor that sets up the client and calls the validation code.

private IQldbDriver qldbDriver;
private IValueFactory valueFactory;

private string ledgerName = "ProdDotNetOnAWSLedger";
private string tableName = "Account";

public DataManager()
{
    valueFactory = new ValueFactory();

    AmazonQLDBSessionConfig amazonQldbSessionConfig = new AmazonQLDBSessionConfig();

    qldbDriver = QldbDriver.Builder()
        .WithQLDBSessionConfig(amazonQldbSessionConfig)
        .WithLedger(ledgerName)
        .Build();

    ValidateTableSetup(tableName);
    ValidateIndexSetup(tableName, "AccountNumber");
}

As you can see below, checking for the existence of the table is simple as the driver has a ListTableNames method that you can use to determine the presence of your table. If it doesn’t exist, then process the query to create it.

Let’s pull that method out and examine it, because this is how you will do all your interactions with QLDB, by executing queries.

private void ValidateTableSetup(string tableName)
{
    if (!qldbDriver.ListTableNames().Any(x => x == tableName))
    {
        qldbDriver.Execute(y => { y.Execute($"CREATE TABLE {tableName}"); });
    }
}

The Execute method accepts a function, in this case, we used a lambda function that executes a “CREATE TABLE” command. The code takes a similar approach to creating the index, however, you have to go through more steps to be able to determine whether the index has already been created as you have to query a schema table first and then parse through the list of indexes on that table. That code is shown below.

private void ValidateIndexSetup(string tableName, string indexField)
{
    var result = qldbDriver.Execute(x =>
    {
        IIonValue ionTableName = this.valueFactory.NewString(tableName);
        return x.Execute($"SELECT * FROM information_schema.user_tables WHERE name = ?", ionTableName);
    });

    var resultList = result.ToList();
    bool isLIsted = false;

    if (resultList.Any())
    {
        IIonList indexes = resultList.First().GetField("indexes");               
        foreach (IIonValue index in indexes)
        {
            string expr = index.GetField("expr").StringValue;
            if (expr.Contains(indexField))
            {
                isLIsted = true;
                break;
            }
        }                
    }
    if (!isLIsted)
    {
        qldbDriver.Execute(y => y.Execute($"CREATE INDEX ON {tableName}({indexField})"));
    }
}

As mentioned earlier, the save is basically a query that is executed. A simple save method is shown below:

public void Save(object item)
{
    qldbDriver.Execute(x =>
        {
            x.Execute($"INSERT INTO {tableName} ?", ToIonValue(item));
        }
    );
}

private IIonValue ToIonValue(object obj)
{
    return IonLoader.Default.Load(JsonConvert.SerializeObject(obj));
}

This example, as well as the one before this one where we did a check to see whether an index exists, both have some dependence on Ion. Remember, Ion is based upon richly-typed JSON, so it expects values to be similarly represented – which is why there are two different approaches for converting an item to an IonValue, even something as simple as the tableName, for comparison. The first of these is through the Amazon.IonDotnet.Tree.ValueFactory object as shown in the index validation snippet while the other is through the Amazon.IonDotnet.Builders.IonLoader as shown in the ToIonValue method.

Once you have the ledger, table, and any indexes you may want set up, the next step is to save some data into the table. This is done by executing a SQL-like command as shown below.

public void Save(object item)
{
    qldbDriver.Execute(x =>
        {
            x.Execute($"INSERT INTO {tableName} ?",
                       ToIonValue(item));
        }
    );
}

Getting it out of the database is a little bit trickier as you want it to be cast into the appropriate data model. The approach for retrieving an item from the database and converting it into a plain old class object (POCO) is shown below.

public List<T> Get<T>(string accountNumber)
{
    IIonValue ionKey = valueFactory.NewString(accountNumber);

    return qldbDriver.Execute(x =>
    {
        IQuery<T> query = x.Query<T>(
            $"SELECT * FROM {tableName} as d 
              WHERE d.AccountNumber = ?", ionKey);
        var results = x.Execute(query);
        return results.ToList();
    });
}

You can see that this is a two-step process. First, you create a generic query that is defined with the SELECT text and the appropriate model for the return set. You then Execute that query. This brings you back an Amazon.QLDB.Driver.Generic.IResult<T> object that can be converted to return a list of the requested items matching the AccountNumber that was passed into the function.

That’s a quick review of the general functionality of QLDB. Let’s look at one more specific case, the history of an item – one of the prime drivers of using a ledger database. You do this by using the history function, a PartiQL extension that returns revisions from the system-defined view of your table. The syntax for using the history function is

SELECT * FROM history(table,`start-time`,`end-time` ] ] ) AS h
[ WHERE h.metadata.id = 'id' ]

 The start time and end time values are optional. Using a start time will select any versions that are active when the start time occurs and any subsequent versions up until the end time. If no start time is provided, then all versions of the document are retrieved, and the current time is used in the query if the end time value is not explicitly defined. However, as you can see, the optional WHERE clause is using a metadata.id value – which we have never talked about. This metadata.id value is used by QLDB to uniquely identify each document and can be accessed through the metadata of the document. One way to get this information is shown below:

public string GetMetadataId(Account item)
{
    IIonValue ionKey = 
                  valueFactory.NewString(item.AccountNumber);
    IEnumerable<IIonValue> result = new List<IIonValue>();

    qldbDriver.Execute(x =>
    {
        result = x.Execute($"SELECT d.metadata.id as id 
                FROM _ql_committed_{tableName} as d 
                WHERE d.data.AccountNumber = ?", ionKey);
    });

    IIonValue working = result.First();
    return working.GetField("id").StringValue;
}

This code snippet accesses the “committed” view, one of the different views provided by QLDB that provides access to information about your data. Taking the result from that query, and plugging it into the previous snippet where we queried using the history function will get you the history of that document for the time period defined in the query. Thus, you can access the current version as well as have access to all the history; all by executing SQL-like commands.

There is a lot more we can go into about using Amazon QLDB and .NET together, probably a whole book! To recap, however, Amazon QLDB is a ledger database, which means it provides an immutable, transparent, and cryptographically verifiable transaction log that you can access and evaluate independently from the active version.