Setting Up WSL 2: My Development Environment Scripts

by Joche Ojeda | Feb 24, 2025 | Linux, Postgres, SyncFrameworkV2, Ubuntu, WSL

After a problematic Windows update on my Surface computer that prevented me from compiling .NET applications, I spent days trying various fixes without success. Eventually, I had to format my computer and start fresh. This meant setting up everything again – Visual Studio, testing databases, and all the other development tools.To make future setups easier, I created a collection of WSL 2 scripts that automate the installation of tools I frequently use, like PostgreSQL and MySQL for testing purposes. While these scripts contain some practices that wouldn’t be recommended for production (like hardcoded passwords), they’re specifically designed for testing environments. The passwords used are already present in the sync framework source code, so there’s no additional security risk.I decided to share these scripts not as a perfect solution, but as a starting point for others who might need to set up similar testing environments. You can use them as inspiration for your own scripts or modify the default passwords to match your needs.

Note that these are specifically for testing purposes – particularly for working with the sync framework – and the hardcoded credentials should never be used in a production environment.

https://github.com/egarim/MyWslScripts

LDAP Scripts

MyWslScripts/ldap-setup.sh at master · egarim/MyWslScripts

MyWslScripts/add-ldap-user.sh at master · egarim/MyWslScripts

MySQL

MyWslScripts/install_mysql.sh at master · egarim/MyWslScripts

Postgres

MyWslScripts/install_postgres.sh at master · egarim/MyWslScripts

Redis

MyWslScripts/redis-install.sh at master · egarim/MyWslScripts

Let me know if you’d like me to share the actual scripts in a follow-up post!

Hard to Kill: Why Auto-Increment Primary Keys Can Make Data Sync Die Harder

by Joche Ojeda | Jan 22, 2025 | ADO, ADO.NET, C#, Data Synchronization, EfCore, XPO, XPO Database Replication

Working with the SyncFramework, I’ve noticed a recurring pattern when discussing schema design with customers. One crucial question that often surprises them is about their choice of primary keys: “Are you using auto-incremental integers or unique identifiers (like GUIDs)?”

Approximately 90% of users rely on auto-incremental integer primary keys. While this seems like a straightforward choice, it can create significant challenges for data synchronization. Let’s dive deep into how different database engines handle auto-increment values and why this matters for synchronization scenarios.

Database Implementation Deep Dive

SQL Server

SQL Server uses the IDENTITY property, storing current values in system tables (sys.identity_columns) and caching them in memory for performance. During restarts, it reads the last used value from these system tables. The values are managed as 8-byte numbers internally, with new ranges allocated when the cache is exhausted.

MySQL

MySQL’s InnoDB engine maintains auto-increment counters in memory and persists them to the system tablespace or table’s .frm file. After a restart, it scans the table to find the maximum used value. Each table has its own counter stored in the metadata.

PostgreSQL

PostgreSQL takes a different approach, using separate sequence objects stored in the pg_class catalog. These sequences maintain their own relation files containing crucial metadata like last value, increment, and min/max values. The sequence data is periodically checkpointed to disk for durability.

Oracle

Oracle traditionally uses sequences and triggers, with modern versions (12c+) supporting identity columns. The sequence information is stored in the SEQ$ system table, tracking the last number used, cache size, and increment values.

The Synchronization Challenge

This diversity in implementation creates several challenges for data synchronization:

Unpredictable Sequence Generation: Even within the same database engine, gaps can occur due to rolled-back transactions or server restarts.
Infrastructure Dependencies: The mechanisms for generating next values are deeply embedded within each database engine and aren’t easily accessible to frameworks like Entity Framework or XPO.
Cross-Database Complexity: When synchronizing across different database instances, coordinating auto-increment values becomes even more complex.

The GUID Alternative

Using GUIDs (Globally Unique Identifiers) as primary keys offers a solution to these synchronization challenges. While GUIDs come with their own set of considerations, they provide guaranteed uniqueness across distributed systems without requiring centralized coordination.

Traditional GUID Concerns

Index fragmentation
Storage size
Performance impact

Modern Solutions

These concerns have been addressed through:

Sequential GUID generation techniques
Improved indexing in modern databases
Optimizations in .NET 9

Recommendations

When designing systems that require data synchronization:

Consider using GUIDs instead of auto-increment integers for primary keys
Evaluate sequential GUID generation for better performance
Understand that auto-increment values, while simple, can complicate synchronization scenarios
Plan for the infrastructure needed to maintain consistent primary key generation across your distributed system

Conclusion

The choice of primary key strategy significantly impacts your system’s ability to handle data synchronization effectively. While auto-increment integers might seem simpler at first, understanding their implementation details across different databases reveals why GUIDs often provide a more robust solution for distributed systems.

Remember: Data synchronization is not a trivial problem, and your primary key strategy plays a crucial role in its success. Take the time to evaluate your requirements and choose the appropriate approach for your specific use case.

Till next time, happy delta encoding.

SyncFramework Update: Now Supporting .NET 9 and EfCore 9!

by Joche Ojeda | Jan 21, 2025 | ADO.NET, C#, Data Synchronization, EfCore

SyncFramework Update: Now Supporting .NET 9!

SyncFramework is a C# library that simplifies data synchronization using delta encoding technology. Instead of transferring entire datasets, it efficiently synchronizes by tracking and transmitting only the changes between data versions, significantly reducing bandwidth and processing overhead.

What’s New

All packages now target .NET 9
BIT.Data.Sync packages updated to support the latest framework
Entity Framework Core packages upgraded to EF Core 9
Various minor fixes and improvements

Available Implementations

SyncFramework for XPO: For DevExpress XPO users
SyncFramework for Entity Framework Core: For EF Core users

Package Statistics

Our packages have been serving the community well, with steady adoption:

BIT.Data.Sync: 2,142 downloads
BIT.Data.Sync.AspNetCore: 1,064 downloads
BIT.Data.Sync.AspNetCore.Xpo: 521 downloads
BIT.Data.Sync.EfCore: 1,691 downloads
BIT.Data.Sync.EfCore.Npgsql: 1,120 downloads
BIT.Data.Sync.EfCore.Pomelo.MySql: 1,172 downloads
BIT.Data.Sync.EfCore.Sqlite: 887 downloads
BIT.Data.Sync.EfCore.SqlServer: 982 downloads

Resources

NuGet Packages
Source Code

As always, you can compile the source code yourself from our GitHub repository. The framework continues to provide reliable data synchronization across different platforms and databases.

Happy Delta Encoding! ?

Divide and Conquer: Subtle Strategies for Supercharging Your Database Performance

by Joche Ojeda | Dec 4, 2023 | Database

Database Table Partitioning

Database table partitioning is a strategy used to divide a large database table into smaller, manageable segments, known as partitions, while maintaining the overall structure and functionality of the table. This technique is implemented in database management systems like Microsoft SQL Server (MSSQL) and PostgreSQL (Postgres).

What is Database Table Partitioning?

Database table partitioning involves breaking down a large table into smaller segments. Each partition contains a subset of the table’s data, based on specific criteria such as date ranges or geographic locations. This allows for more efficient data management and can significantly improve performance for certain types of queries.

Impact of Partitioning on CRUD Operations

Create: Streamlines the insertion of new records to the appropriate partition, leading to faster insert operations.
Read: Enhances query performance as searches can be limited to relevant partitions, accelerating read operations.
Update: Makes updating data more efficient, but may add overhead if data moves across partitions.
Delete: Simplifies and speeds up deletion, especially when dropping entire partitions.

Advantages of Database Table Partitioning

Improved Performance: Particularly for read operations, partitioning can significantly enhance query speeds.
Easier Data Management: Managing smaller partitions is more straightforward.
Efficient Maintenance: Maintenance tasks can be conducted on individual partitions.
Organized Data Structure: Helps in logically organizing data.

Disadvantages of Database Table Partitioning

Increased Complexity: Adds complexity to database management.
Resource Overhead: May require more disk space and memory.
Uneven Performance Risks: Incorrect partition sizing or data distribution can lead to bottlenecks.

MSSQL Server: Example Scenario

In MSSQL, table partitioning involves partition functions and schemes. For example, a SalesData table can be partitioned by year, enhancing CRUD operation efficiency. Here’s an example of how you might partition a table in MSSQL:

-- Create a partition function
CREATE PARTITION FUNCTION SalesDataYearPF (int)
AS RANGE RIGHT FOR VALUES (2015, 2016, 2017, 2018, 2019, 2020);

-- Create a partition scheme
CREATE PARTITION SCHEME SalesDataYearPS
AS PARTITION SalesDataYearPF ALL TO ([PRIMARY]);

-- Create a partitioned table
CREATE TABLE SalesData
(
    SalesID int IDENTITY(1,1) NOT NULL,
    SalesYear int NOT NULL,
    SalesAmount decimal(10,2) NOT NULL
) ON SalesDataYearPS (SalesYear);

PostgreSQL: Example Scenario

In Postgres, partitioning uses table inheritance. A rapidly growing Logs table can be partitioned monthly, optimizing CRUD operations. Here’s an example of how you might partition a table in PostgreSQL:

-- Create a master table
CREATE TABLE logs (
    logdate DATE NOT NULL,
    logevent TEXT
) PARTITION BY RANGE (logdate);

-- Create partitions
CREATE TABLE logs_y2020m01 PARTITION OF logs
    FOR VALUES FROM ('2020-01-01') TO ('2020-02-01');

CREATE TABLE logs_y2020m02 PARTITION OF logs
    FOR VALUES FROM ('2020-02-01') TO ('2020-03-01');

Conclusion

Database table partitioning in MSSQL and Postgres significantly affects CRUD operations. While offering benefits like improved query speed and streamlined data management, it also introduces complexities and demands careful planning. By understanding the advantages and disadvantages of partitioning, and by using the appropriate SQL commands for your specific database system, you can effectively implement this powerful tool in your data management strategy.

Setting Up WSL 2: My Development Environment Scripts

Hard to Kill: Why Auto-Increment Primary Keys Can Make Data Sync Die Harder

Database Implementation Deep Dive

SQL Server

MySQL

PostgreSQL

Oracle

The Synchronization Challenge

The GUID Alternative

Traditional GUID Concerns

Modern Solutions

Recommendations

Conclusion

SyncFramework Update: Now Supporting .NET 9 and EfCore 9!

SyncFramework Update: Now Supporting .NET 9!

What’s New

Available Implementations

Package Statistics

Resources

Divide and Conquer: Subtle Strategies for Supercharging Your Database Performance

Database Table Partitioning

What is Database Table Partitioning?

Impact of Partitioning on CRUD Operations

Advantages of Database Table Partitioning

Disadvantages of Database Table Partitioning

MSSQL Server: Example Scenario

PostgreSQL: Example Scenario

Conclusion

Search

Recent Posts

Categories

Archives