The mystery of lost values: Understanding ASCII vs. UTF-8 in Database Queries

The mystery of lost values: Understanding ASCII vs. UTF-8 in Database Queries

Understanding ASCII vs. UTF-8 in Database Queries: A Practical Guide

 

When dealing with databases, understanding how different character encodings impact queries is crucial. Two common encoding standards are ASCII and UTF-8. This blog post delves into their differences, how they affect case-sensitive queries, and provides practical examples to illustrate these concepts.

ASCII vs. UTF-8: What’s the Difference?

 

ASCII (American Standard Code for Information Interchange)

 

  • Description: A character encoding standard using 7 bits to represent each character, allowing for 128 unique symbols. These include control characters (like newline), digits, uppercase and lowercase English letters, and some special symbols.
  • Range: 0 to 127.

 

UTF-8 (8-bit Unicode Transformation Format)

 

  • Description: A variable-width character encoding capable of encoding all 1,112,064 valid character code points in Unicode using one to four 8-bit bytes. UTF-8 is backward compatible with ASCII.
  • Range: Can represent characters in a much wider range, including all characters in all languages, as well as many symbols and special characters.

 

ASCII and UTF-8 Position Examples

 

Let’s compare the positions of some characters in both ASCII and UTF-8:

Character ASCII Position UTF-8 Position
A 65 65
B 66 66
Y 89 89
Z 90 90
[ 91 91
\ 92 92
] 93 93
^ 94 94
_ 95 95
` 96 96
a 97 97
b 98 98
y 121 121
z 122 122
Last ASCII (DEL) 127 127
ÿ Not present 195 191 (2 bytes)

Case Sensitivity in Database Queries

 

Case sensitivity can significantly impact database queries, as different encoding schemes represent characters differently.

 

ASCII Example

 

-- Case-sensitive query in ASCII-encoded database
SELECT * FROM users WHERE username = 'Alice';
-- This will not return rows with 'alice', 'ALICE', etc.

UTF-8 Example

 

-- Case-sensitive query in UTF-8 encoded database
SELECT * FROM users WHERE username = 'Ålice';
-- This will not return rows with 'ålice', 'ÅLICE', etc.

Practical Example with Positions

 

For ASCII, the characters included in the range >= 'A' and <= 'z' are:

  • A has a position of 65.
  • a has a position of 97.

In a case-sensitive search, these positions are distinct, so A is not equal to a.

For UTF-8, the characters included in this range are the same since UTF-8 is backward compatible with ASCII for characters in this range.

 

Query Example

 

Let’s demonstrate a query example for usernames within the range >= 'A' and <= 'z'.

-- Query for usernames in the range 'A' to 'z'
SELECT * FROM users WHERE username >= 'A' AND username <= 'z';

Included Characters

 

Based on the ASCII positions, the range >= 'A' and <= 'z' includes:

  • All uppercase letters: A to Z (positions 65 to 90)
  • Special characters: [, \, ], ^, _, and ` (positions 91 to 96)
  • All lowercase letters: a to z (positions 97 to 122)

Practical Example with Positions

 

Given the following table:

-- Create a table
CREATE TABLE users (
    id INT PRIMARY KEY,
    username VARCHAR(255) CHARACTER SET utf8 COLLATE utf8_bin
);

-- Insert some users
INSERT INTO users (id, username) VALUES (1, 'Alice');   -- A = 65, l = 108, i = 105, c = 99, e = 101
INSERT INTO users (id, username) VALUES (2, 'alice');   -- a = 97, l = 108, i = 105, c = 99, e = 101
INSERT INTO users (id, username) VALUES (3, 'Ålice');   -- Å = 195 133, l = 108, i = 105, c = 99, e = 101
INSERT INTO users (id, username) VALUES (4, 'ålice');   -- å = 195 165, l = 108, i = 105, c = 99, e = 101
INSERT INTO users (id, username) VALUES (5, 'Z');       -- Z = 90
INSERT INTO users (id, username) VALUES (6, 'z');       -- z = 122
INSERT INTO users (id, username) VALUES (7, 'ÿ');       -- ÿ = 195 191
INSERT INTO users (id, username) VALUES (8, '_special');-- _ = 95, s = 115, p = 112, e = 101, c = 99, i = 105, a = 97, l = 108
INSERT INTO users (id, username) VALUES (9, 'example'); -- e = 101, x = 120, a = 97, m = 109, p = 112, l = 108, e = 101

Query Execution

 

-- Execute the query
SELECT * FROM users WHERE username >= 'A' AND username <= 'z';

Query Result

 

This query will include the following usernames based on the range:

  • Alice (A = 65, l = 108, i = 105, c = 99, e = 101)
  • Z (Z = 90)
  • example (e = 101, x = 120, a = 97, m = 109, p = 112, l = 108, e = 101)
  • _special (_ = 95, s = 115, p = 112, e = 101, c = 99, i = 105, a = 97, l = 108)
  • alice (a = 97, l = 108, i = 105, c = 99, e = 101)
  • z (z = 122)

However, it will not include:

  • Ålice (Å = 195 133, l = 108, i = 105, c = 99, e = 101, outside the specified range)
  • ålice (å = 195 165, l = 108, i = 105, c = 99, e = 101, outside the specified range)
  • ÿ (ÿ = 195 191, outside the specified range)

Conclusion

 

Understanding the differences between ASCII and UTF-8 character positions and ranges is crucial when performing case-sensitive queries in databases. For example, querying for usernames within the range >= 'A' and <= 'z' will include a specific set of characters based on their ASCII positions, impacting which rows are returned in your query results.

By grasping these concepts, you can ensure your database queries are accurate and efficient, especially when dealing with different encoding schemes.

Discovering the Simplicity of C# in Blockchain Development with Stratis

Discovering the Simplicity of C# in Blockchain Development with Stratis

Introduction

Blockchain technology has revolutionized various industries by providing a decentralized and secure way to manage data and transactions. At the heart of this innovation are smart contracts—self-executing contracts with the terms directly written into code. My journey into blockchain development began with the excitement of these possibilities, but it also came with challenges, particularly with the Solidity programming language. However, everything changed when I discovered the Stratis platform, which supports smart contracts using C#, making development much more accessible for me. In this article, I’ll share my experiences, challenges, and the eventual breakthrough that came with Stratis.

Challenges with Solidity

Solidity is the most popular language for writing smart contracts on Ethereum, but it has a steep learning curve. My background in programming didn’t include a lot of JavaScript-like languages, so adapting to Solidity’s syntax and concepts was daunting. The process of writing, testing, and deploying smart contracts often felt cumbersome. Debugging was a particular pain point, with cryptic error messages and a lack of mature tooling compared to more established programming environments.

The complexity and frustration of dealing with these issues made me seek an alternative that could leverage my existing programming skills. I wanted a platform that was easier to work with and more aligned with languages I was already comfortable with. This search led me to discover Stratis.

Introduction to Stratis

Stratis is a blockchain development platform designed to meet the needs of enterprises and developers by offering a simpler and more efficient way to build blockchain solutions. What caught my attention was its support for C#—a language I was already proficient in. Stratis allows developers to create smart contracts using C#, integrating seamlessly with the .NET ecosystem.

This discovery was a game-changer for me. The prospect of using a familiar language in a robust development environment like Visual Studio, combined with the powerful features of Stratis, promised a much smoother and more productive development experience.

Why Stratis Stood Out

The primary benefit of using C# over Solidity is the familiarity and maturity of the development tools. With C#, I could leverage the rich ecosystem of libraries, tools, and frameworks available in the .NET environment. This not only sped up the development process but also reduced the time spent on debugging and testing.

Stratis offers a comprehensive suite of tools designed to simplify blockchain development. The Stratis Full Node, for instance, provides a fully functional blockchain node that can be easily integrated into existing applications. Additionally, Stratis offers a smart contract template for Visual Studio, making it straightforward to start building and deploying smart contracts.

Another significant advantage is the support and community around Stratis. The documentation is thorough, and the community is active, providing a wealth of resources and assistance for developers at all levels.

Conclusion

Transitioning from Solidity to Stratis was a pivotal moment in my blockchain development journey. The challenges I faced with Solidity were mitigated by the ease and familiarity of C#. Stratis provided a robust and efficient platform that significantly improved my development workflow.

In the next article, I will dive into the practical steps of setting up the Stratis development environment. We’ll cover everything you need to get started, from installing the necessary tools to configuring your first Stratis Full Node. Stay tuned for a detailed guide that will set the foundation for your journey into C# smart contract development.

Design Patterns for Library Creators in Dotnet

Design Patterns for Library Creators in Dotnet

Hello there! Today, we’re going to delve into the fascinating world of design patterns. Don’t worry if you’re not a tech whiz – we’ll keep things simple and relatable. We’ll use the SyncFramework as an example, but our main focus will be on the design patterns themselves. So, let’s get started!

What are Design Patterns?

Design patterns are like blueprints – they provide solutions to common problems that occur in software design. They’re not ready-made code that you can directly insert into your program. Instead, they’re guidelines you can follow to solve a particular problem in a specific context.

SOLID Design Principles

One of the most popular sets of design principles is SOLID. It’s an acronym that stands for five principles that help make software designs more understandable, flexible, and maintainable. Let’s break it down:

  1. Single Responsibility Principle: A class should have only one reason to change. In other words, it should have only one job.
  2. Open-Closed Principle: Software entities should be open for extension but closed for modification. This means we should be able to add new features or functionality without changing the existing code.
  3. Liskov Substitution Principle: Subtypes must be substitutable for their base types. This principle is about creating new derived classes that can replace the functionality of the base class without breaking the application.
  4. Interface Segregation Principle: Clients should not be forced to depend on interfaces they do not use. This principle is about reducing the side effects and frequency of required changes by splitting the software into multiple, independent parts.
  5. Dependency Inversion Principle: High-level modules should not depend on low-level modules. Both should depend on abstractions. This principle allows for decoupling.

Applying SOLID Principles in SyncFramework

The SyncFramework is a great example of how these principles can be applied. Here’s how:

  • Single Responsibility Principle: Each component of the SyncFramework has a specific role. For instance, one component is responsible for tracking changes, while another handles conflict resolution.
  • Open-Closed Principle: The SyncFramework is designed to be extensible. You can add new data sources or change the way data is synchronized without modifying the core framework.
  • Liskov Substitution Principle: The SyncFramework uses base classes and interfaces that allow for substitutable components. This means you can replace or modify components without affecting the overall functionality.
  • Interface Segregation Principle: The SyncFramework provides a range of interfaces, allowing you to choose the ones you need and ignore the ones you don’t.
  • Dependency Inversion Principle: The SyncFramework depends on abstractions, not on concrete classes. This makes it more flexible and adaptable to changes.

 

And that’s a wrap for today! But don’t worry, this is just the beginning. In the upcoming series of articles, we’ll dive deeper into each of these principles. We’ll explore how they’re applied in the source code of the SyncFramework, providing real-world examples to help you understand these concepts better. So, stay tuned for more exciting insights into the world of design patterns! See you in the next article!

 

Related articles

If you want to learn more about data synchronization you can checkout the following blog posts:

  1. Data synchronization in a few words – https://www.jocheojeda.com/2021/10/10/data-synchronization-in-a-few-words/
  2. Parts of a Synchronization Framework – https://www.jocheojeda.com/2021/10/10/parts-of-a-synchronization-framework/
  3. Let’s write a Synchronization Framework in C# – https://www.jocheojeda.com/2021/10/11/lets-write-a-synchronization-framework-in-c/
  4. Synchronization Framework Base Classes – https://www.jocheojeda.com/2021/10/12/synchronization-framework-base-classes/
  5. Planning the first implementation – https://www.jocheojeda.com/2021/10/12/planning-the-first-implementation/
  6. Testing the first implementation – https://youtu.be/l2-yPlExSrg
  7. Adding network support – https://www.jocheojeda.com/2021/10/17/syncframework-adding-network-support/

 

A Beginner’s Guide to System.Security.SecurityRules and SecuritySafeCritical in C#

A Beginner’s Guide to System.Security.SecurityRules and SecuritySafeCritical in C#

 

A Beginner’s Guide to System.Security.SecurityRules and SecuritySafeCritical in C#

Introduction

In the .NET Framework, security is a critical concern. Two attributes, System.Security.SecurityRules and SecuritySafeCritical, play a significant role in enforcing Code Access Security (CAS).

System.Security.SecurityRules

The System.Security.SecurityRules attribute specifies the set of security rules that the common language runtime should enforce for an assembly. It has two levels: Level1 and Level2.

Level1

Level1 uses the .NET Framework version 2.0 transparency rules. Here are the key rules for Level1:

  • Public security-critical types and members are treated as security-safe-critical outside the assembly.
  • Security-critical types and members must perform a link demand for full trust to enforce security-critical behavior when they are accessed by external callers.
  • Level1 rules should be used only for compatibility, such as for .NET Framework 2.0 assemblies.

[assembly: System.Security.SecurityRules(System.Security.SecurityRuleSet.Level1)]
public class MyClass
{
    // Your code here
}

SecuritySafeCritical

The SecuritySafeCritical attribute identifies types or members as security-critical and safely accessible by transparent code. Code marked with SecuritySafeCritical must undergo a rigorous security audit to ensure that it can be used safely in a secure execution environment. It must validate the permissions of callers to determine whether they have authority to access protected resources used by the code.


[System.Security.SecuritySafeCritical]
public void MyMethod()
{
    // Your code here
}

Relationship between System.Security.SecurityRules and SecuritySafeCritical

The System.Security.SecurityRules and SecuritySafeCritical attributes work together to enforce security in .NET Framework. An assembly marked with SecurityRules(SecurityRuleSet.Level1) uses the .NET Framework version 2.0 transparency rules, where public security-critical types and members are treated as security-safe-critical outside the assembly.

The concept of trusted Code

Trusted code refers to code that has been granted certain permissions and is considered safe to execute. It’s a combination of techniques, policies, and procedures for which there is no plausible scenario in which a document retrieved from or reproduced by the system could differ substantially from the document that is originally stored. In other words, trusted code certifies that electronically stored information (ESI) is an authentic copy of the original document or information.

Use Cases and Examples

Consider a scenario where you have a method that performs a critical operation, such as accessing a protected resource. You want to ensure that this method can only be called by trusted code. You can mark this method as SecuritySafeCritical to enforce this.


[System.Security.SecuritySafeCritical]
public void AccessProtectedResource()
{
    // Code to access protected resource
}

In this case, the AccessProtectedResource method can only be called by code that has been granted the necessary permissions. This helps to prevent unauthorized access to the protected resource.

Conclusion

Understanding the System.Security.SecurityRules and SecuritySafeCritical attributes is crucial when developing secure .NET applications. By using these attributes correctly, you can enforce robust security rules and protect your application from potential threats. Always remember, with great power comes great responsibility!

I hope this article helps you understand these concepts better. Happy coding! 😊

 

Introduction to Machine Learning in C#: Spam Detection using Binary Classification

Introduction to Machine Learning in C#: Spam Detection using Binary Classification

Introduction to Machine Learning in C#: Spam using Binary Classification

This example demonstrates the basics of machine learning in C# using ML.NET, Microsoft’s machine learning framework specifically designed for .NET applications. ML.NET offers a versatile, cross-platform framework that simplifies integrating machine learning into .NET applications, making it accessible for developers familiar with the .NET ecosystem.

Technologies Used

  • C#: A modern, object-oriented programming language developed by Microsoft, which is widely used for a variety of applications. In this example, C# is used to define data models, process data, and implement the machine learning pipeline.
  • ML.NET: An open-source and cross-platform machine learning framework for .NET. It is used in this example to create a machine learning model for classifying emails as spam or not spam. ML.NET simplifies the process of training, evaluating, and consuming machine learning models in .NET applications.
  • .NET Core: A cross-platform version of .NET for building applications that run on Windows, Linux, and macOS. It provides the runtime environment for our C# application.

The example focuses on a simple spam detection system. It utilizes text data processing and binary classification, two common tasks in machine learning, to classify emails into spam and non-spam categories. This is achieved through the use of a logistic regression model, a fundamental algorithm for binary classification problems.

Creating an NUnit Test Project in Visual Studio Code

 

           Setting up NUnit for DecisionTreeDemo

 

    • Install .NET Core SDK

      Download and install the .NET Core SDK from the .NET official website.

    • Install Visual Studio Code

      Download and install Visual Studio Code (VS Code) from here. Also, install the C# extension for VS Code by Microsoft.

    • Create a New .NET Core Project

      Open VS Code, and in the terminal, create a new .NET Core project:

      dotnet new console -n DecisionTreeDemo
      cd DecisionTreeDemo
    • Add the ML.NET Package

      Add the ML.NET package to your project:

      dotnet add package Microsoft.ML
    • Create the Test Project

      Create a separate directory for your test project, then initialize a new test project:

          
      mkdir DecisionTreeDemo.Tests
      cd DecisionTreeDemo.Tests
      dotnet new nunit
    • Add Required Packages to Test Project

      Add the necessary NUnit and ML.NET packages:

      dotnet add package NUnit
      dotnet add package Microsoft.NET.Test.Sdk
      dotnet add package NUnit3TestAdapter
      dotnet add package Microsoft.ML
    • Reference the Main Project

      Reference the main project:

          dotnet add reference ../DecisionTreeDemo/DecisionTreeDemo.csproj
    • Write Test Cases

      Write NUnit test cases within your test project to test different functionalities of your ML.NET application.

      Define the Data Model for the Email

      Include the content of the email and whether it’s classified as spam.

          
      public class Email
      {
          [LoadColumn(0)]
          public string Content { get; set; }
      
          [LoadColumn(1), ColumnName("Label")]
          public bool IsSpam { get; set; }
      }
      

      Define the Model for Spam Prediction

      This model is used to determine whether an email is spam.

        
      public class SpamPrediction
      {
          [ColumnName("PredictedLabel")]
          public bool IsSpam { get; set; }
      }
      

      Write the test case

             
      // Create a new ML context for the application, which is a starting point for ML.NET operations.
              var mlContext = new MLContext();
      
              // Example dataset of emails. In a real-world scenario, this would be much larger and possibly loaded from an external source.
              var data = new List
              {
                  new Email { Content = "Buy cheap products now", IsSpam = true },
                  new Email { Content = "Meeting at 3 PM", IsSpam = false },
                  // Additional data can be added here...
              };
      
              // Load the data into the ML.NET data model.
              var trainData = mlContext.Data.LoadFromEnumerable(data);
      
              // Define the data processing pipeline. Here we are featurizing the text (i.e., converting text into numeric features) and then     applying a logistic regression model.
              var pipeline = mlContext.Transforms.Text.FeaturizeText("Features", nameof(Email.Content))
                  .Append(mlContext.BinaryClassification.Trainers.SdcaLogisticRegression());
      
              // Train the model on the loaded data.
              var model = pipeline.Fit(trainData);
      
              // Create a prediction engine for making predictions on individual data samples.
              var predictionEngine = mlContext.Model.CreatePredictionEngine<Email, SpamPrediction>(model);
      
              // Create a sample email to test the model.
              var sampleEmail = new Email { Content = "Special discount, buy now!" };
              var prediction = predictionEngine.Predict(sampleEmail);
      
              // Output the prediction to the console.
              Debug.WriteLine($"Email: '{sampleEmail.Content}' is {(prediction.IsSpam ? "spam" : "not spam")}");
              Assert.IsTrue(prediction.IsSpam);
      
    • Running Tests

      Run the tests with the following command:

      dotnet test

As you can see the test will pass because the sample email contains the word “buy” that was used in the training data and was labeled as spam

You can download the source code for this article here

This article has explored the fundamentals of machine learning in C# using the ML.NET framework. By defining specific data models and utilizing ML.NET’s powerful features, we demonstrated how to build a simple yet effective spam detection system. This example serves as a gateway into the vast world of machine learning, showcasing the potential for integrating AI technologies into .NET applications. The skills and concepts learned here lay the groundwork for further exploration and development in the exciting field of machine learning and artificial intelligence.