SQL Server 2025 — The AI-Ready Database with Vector Search, Native JSON, and RegEx

Posted on: 4/26/2026 6:13:08 PM

vectorNew AI data type
RegEx7 Regular Expression functions
2GBNative JSON per row
50GBExpress Edition (5x increase)

SQL Server 2025 (version 17.x) was officially released in November 2025, marking the most significant leap for Microsoft's relational database platform in the past decade. Beyond simple performance upgrades, this version introduces entirely new capabilities: vector data type for AI/ML, native JSON columns, Regular Expressions in T-SQL, real-time Change Event Streaming, and the ability to invoke AI models directly from within the engine. This article provides an in-depth analysis of each major feature.

1. Vector Data Type — SQL Server Becomes a Vector Database

The standout feature of SQL Server 2025 is the native vector data type, enabling storage and querying of vector embeddings directly in the database without external services.

1.1 Using Vectors

-- Create table with vector column
CREATE TABLE Products (
    Id INT IDENTITY(1,1) PRIMARY KEY,
    Name NVARCHAR(200),
    Description NVARCHAR(MAX),
    Embedding vector(1536)  -- 1536 dimensions, compatible with OpenAI text-embedding-3-small
);

-- Insert vector as JSON array
INSERT INTO Products (Name, Description, Embedding)
VALUES (
    N'Gaming Laptop',
    N'High-performance laptop for gamers',
    '[0.0123, -0.0456, 0.0789, ...]'  -- 1536 float values
);

Vectors are stored in an optimized binary format but displayed as JSON arrays. Each element can be stored as single-precision (4 bytes) or half-precision (2 bytes) floating-point values.

1.2 Vector Search with DiskANN Index

SQL Server 2025 uses the DiskANN (Disk-based Approximate Nearest Neighbor) algorithm — the same technology Microsoft Research developed for Bing Search — to perform ultra-fast similarity searches.

-- Enable preview features
ALTER DATABASE SCOPED CONFIGURATION SET PREVIEW_FEATURES = ON;

-- Create vector index
CREATE VECTOR INDEX IX_Products_Embedding
ON Products(Embedding)
WITH (metric = 'cosine', type = 'diskann');

-- Find top 10 most similar products
SELECT TOP 10 p.Name, p.Description,
    VECTOR_DISTANCE('cosine', p.Embedding, @queryVector) AS distance
FROM VECTOR_SEARCH(Products, 'IX_Products_Embedding', @queryVector, 10) AS vs
INNER JOIN Products p ON p.Id = vs.Id
ORDER BY distance;

New Vector Functions

VECTOR_DISTANCE — calculates distance between 2 vectors (cosine, euclidean, dot product). VECTOR_NORM — returns vector magnitude. VECTOR_NORMALIZE — normalizes a vector. VECTORPROPERTY — retrieves properties (dimensions, data type).

1.3 Direct AI Model Integration

SQL Server 2025 allows defining external AI models directly in the database, then calling them to generate embeddings or text:

-- Define AI model
CREATE EXTERNAL MODEL EmbeddingModel
WITH (
    MODEL_TYPE = EMBEDDINGS,
    LOCATION = 'https://your-openai.openai.azure.com/',
    API_KEY = 'your-api-key',
    DEPLOYMENT = 'text-embedding-3-small'
);

-- Generate embeddings directly in T-SQL
SELECT AI_GENERATE_EMBEDDINGS(EmbeddingModel, Description)
FROM Products
WHERE Embedding IS NULL;

-- Chunk large text before embedding
SELECT * FROM AI_GENERATE_CHUNKS(
    'sentence',   -- chunk type
    512,          -- chunk size
    LongDescription
) AS chunks;
graph LR
    A[Text Data] --> B[AI_GENERATE_CHUNKS]
    B --> C[Text Chunks]
    C --> D[AI_GENERATE_EMBEDDINGS]
    D --> E[Vector Embeddings]
    E --> F[Vector Index - DiskANN]
    F --> G[VECTOR_SEARCH]
    G --> H[Similar Results]
    style A fill:#f8f9fa,stroke:#e94560,color:#2c3e50
    style F fill:#e94560,stroke:#fff,color:#fff
    style H fill:#4CAF50,stroke:#fff,color:#fff
Complete Semantic Search pipeline within SQL Server 2025

2. Native JSON — No Longer Text, Now a First-Class Citizen

SQL Server 2025 introduces a native json data type, stored in an optimized binary format instead of plain text like previous versions. Maximum capacity reaches 2GB per row.

-- Using native JSON type
CREATE TABLE Orders (
    Id INT IDENTITY(1,1) PRIMARY KEY,
    OrderData json,  -- native JSON type, no more NVARCHAR(MAX)
    CreatedAt DATETIME2 DEFAULT GETUTCDATE()
);

-- Insert JSON
INSERT INTO Orders (OrderData)
VALUES ('{"customer":"John Doe","items":[{"sku":"LAP001","qty":1,"price":1299.99}],"total":1299.99}');

-- Query JSON with path expressions
SELECT
    JSON_VALUE(OrderData, '$.customer') AS Customer,
    JSON_VALUE(OrderData, '$.total') AS Total,
    JSON_QUERY(OrderData, '$.items') AS Items
FROM Orders;

2.1 JSON Aggregation Functions

Two new aggregate functions allow building JSON directly from query results:

-- JSON_OBJECTAGG — create JSON object from aggregation
SELECT JSON_OBJECTAGG(CategoryName : ProductCount)
FROM (
    SELECT c.Name AS CategoryName, COUNT(*) AS ProductCount
    FROM Products p
    JOIN Categories c ON p.CategoryId = c.Id
    GROUP BY c.Name
) sub;
-- Result: {"Electronics":42,"Clothing":28,"Books":15}

-- JSON_ARRAYAGG — create JSON array from aggregation
SELECT JSON_ARRAYAGG(Name ORDER BY Price DESC)
FROM Products
WHERE CategoryId = 1;
-- Result: ["MacBook Pro","ThinkPad X1","Dell XPS 15"]

Native JSON Performance

The native json type is stored in binary format, making queries significantly faster compared to the traditional NVARCHAR(MAX) + JSON_VALUE() approach. No text parsing on each query — the engine reads directly from the binary representation.

3. Regular Expressions in T-SQL

After decades of community requests, SQL Server 2025 finally integrates 7 RegEx functions directly into T-SQL, eliminating the need for CLR functions or external processing.

FunctionDescriptionExample
REGEXP_LIKECheck if pattern matchesREGEXP_LIKE(Email, '^[a-z]+@[a-z]+\.[a-z]{2,}$')
REGEXP_REPLACEReplace by patternREGEXP_REPLACE(Phone, '[^0-9]', '')
REGEXP_SUBSTRExtract substringREGEXP_SUBSTR(Log, '\d{4}-\d{2}-\d{2}')
REGEXP_INSTRPosition of matchREGEXP_INSTR(Text, 'error|warning', 1, 1)
REGEXP_COUNTCount matchesREGEXP_COUNT(Content, '\bhttps?://\S+')
REGEXP_MATCHESReturn table of captured groupsREGEXP_MATCHES(Html, '<a href="([^"]+)">')
REGEXP_SPLIT_TO_TABLESplit string by pattern into tableREGEXP_SPLIT_TO_TABLE(CSV, ',\s*')
-- Validate email format
SELECT * FROM Users
WHERE REGEXP_LIKE(Email, '^[\w.+-]+@[\w-]+\.[\w.]+$');

-- Extract all phone numbers from text
SELECT value AS PhoneNumber
FROM Contacts
CROSS APPLY REGEXP_MATCHES(Notes, '(\+?\d{1,3}[-.\s]?\(?\d{1,4}\)?[-.\s]?\d{1,4}[-.\s]?\d{1,9})');

-- Clean data — remove HTML tags
UPDATE Articles
SET PlainText = REGEXP_REPLACE(HtmlContent, '<[^>]+>', '');

4. Change Event Streaming — Next-Gen CDC

Change Event Streaming (CES) is an entirely new feature that streams data changes (DML) in real-time to Azure Event Hubs as CloudEvents (JSON or Avro Binary). This is a major advancement over traditional Change Data Capture (CDC).

sequenceDiagram
    participant App as Application
    participant SQL as SQL Server 2025
    participant CES as Change Event Streaming
    participant EH as Azure Event Hubs
    participant Consumer as Downstream Services
    App->>SQL: INSERT/UPDATE/DELETE
    SQL->>CES: Capture change events
    CES->>EH: Publish CloudEvent (JSON/Avro)
    EH->>Consumer: Stream to consumers
    Note over CES,EH: Near real-time, includes old + new values
Change Event Streaming workflow

CES vs Traditional CDC

CDC stores changes in internal tables; consumers must poll to read them. CES pushes directly to Event Hubs, including both before and after values, in standard CloudEvent format. CES is ideal for event-driven architectures, real-time analytics, and cross-system synchronization.

5. Built-in REST API

SQL Server 2025 can call REST endpoints from within the engine via sp_invoke_external_rest_endpoint, enabling direct integration with external services without middleware:

-- Call Azure OpenAI from SQL Server
DECLARE @response NVARCHAR(MAX);
EXEC sp_invoke_external_rest_endpoint
    @url = 'https://your-resource.openai.azure.com/openai/deployments/gpt-4o/chat/completions?api-version=2024-02-15-preview',
    @method = 'POST',
    @headers = '{"api-key":"your-api-key"}',
    @payload = '{"messages":[{"role":"user","content":"Classify this product: Dell XPS 15 Laptop"}]}',
    @response = @response OUTPUT;

-- Call Azure Function for business logic
EXEC sp_invoke_external_rest_endpoint
    @url = 'https://your-func.azurewebsites.net/api/ProcessOrder',
    @method = 'POST',
    @payload = '{"orderId": 12345}';

6. Fuzzy String Matching

Four new functions enable approximate matching, particularly useful for data deduplication and flexible searching:

-- Find similar customer names (typo tolerance)
SELECT c1.Name, c2.Name,
    EDIT_DISTANCE(c1.Name, c2.Name) AS EditDist,
    JARO_WINKLER_SIMILARITY(c1.Name, c2.Name) AS Similarity
FROM Customers c1
CROSS JOIN Customers c2
WHERE c1.Id < c2.Id
    AND JARO_WINKLER_SIMILARITY(c1.Name, c2.Name) > 85;

-- Result: "John Smith" vs "Jon Smith" → Similarity = 95

7. Zero Trust Security

SQL Server 2025 significantly enhances security with a Zero Trust philosophy:

  • PBKDF2 by default — Password hashing uses PBKDF2 instead of SHA-512, compliant with NIST SP 800-63b
  • TLS 1.3 + TDS 8.0 — Strongest connection encryption, applied to all: replication, linked servers, log shipping, PolyBase, Always On
  • Microsoft Entra ID — Managed identity for outbound connections (to Azure Blob, Key Vault) and inbound (external users)
  • OAEP Padding — RSA encryption uses OAEP instead of legacy PKCS#1 v1.5
  • Custom password policy on Linux — Enforce policies for SQL auth logins on Linux

8. Performance — Next-Gen Intelligent Query Processing

graph TD
    A[Intelligent Query Processing] --> B[CE Feedback for Expressions]
    A --> C[Optional Parameter Plan - OPPO]
    A --> D[DOP Feedback - ON by default]
    A --> E[Query Store on Secondary]
    A --> F[ABORT_QUERY_EXECUTION hint]
    B --> B1[Learns from previous executions
Auto-selects optimal CE model] C --> C1[Multi-plan for same query
Selects plan per parameter value] D --> D1[Auto-adjusts parallelism
Based on actual workload] style A fill:#e94560,stroke:#fff,color:#fff style B fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style C fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style D fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style E fill:#f8f9fa,stroke:#e94560,color:#2c3e50 style F fill:#f8f9fa,stroke:#e94560,color:#2c3e50
New Intelligent Query Processing features in SQL Server 2025

8.1 Other Engine Improvements

FeatureDetailsBenefit
Optimized LockingReduces blocking and lock memoryAvoids lock escalation, higher throughput
ZSTD Backup CompressionZSTD algorithm replaces LZ77Faster, better compression ratio
Tempdb GovernanceResource Governor manages tempdb spacePrevents runaway workloads from causing outages
Optimized sp_executesqlSerializes compilation for dynamic SQLReduces compilation storms
Columnstore ImprovementsOrdered nonclustered, online buildSignificantly faster analytics queries
ADR in TempdbAccelerated Database Recovery for tempdbFaster recovery for temp table transactions

9. Notable Edition Changes

Express Edition — 50GB, a 5x Increase

SQL Server Express 2025 raises the database limit from 10GB to 50GB, while including all features previously exclusive to Express with Advanced Services. The Web Edition has been discontinued. Standard Edition now supports up to 32 cores and 256GB buffer pool.

10. Other T-SQL Enhancements

-- String concatenation with || (ANSI standard)
SELECT FirstName || ' ' || LastName AS FullName FROM Users;

-- SUBSTRING without length (defaults to end of string)
SELECT SUBSTRING(Email, CHARINDEX('@', Email) + 1) AS Domain FROM Users;

-- DATEADD supports bigint
SELECT DATEADD(MILLISECOND, CAST(9999999999 AS BIGINT), '2020-01-01');

-- UNISTR for Unicode encoding
SELECT UNISTR('Hello\0020World\0021');  -- "Hello World!"

-- PRODUCT aggregate
SELECT CategoryId, PRODUCT(Multiplier) AS CumulativeMultiplier
FROM Adjustments
GROUP BY CategoryId;

-- BASE64 encoding/decoding
SELECT BASE64_ENCODE(CAST('Hello SQL Server 2025' AS VARBINARY(MAX)));
SELECT CAST(BASE64_DECODE('SGVsbG8gU1FMIFNlcnZlciAyMDI1') AS VARCHAR(MAX));

-- CURRENT_DATE (ANSI standard)
SELECT CURRENT_DATE;

11. Fabric Mirroring — Bridging On-Premises and Cloud

SQL Server 2025 supports continuous mirroring to Microsoft Fabric, enabling data replication from on-premises to Fabric for analytics with Power BI, Data Engineering, and AI workloads without complex ETL pipelines.

graph LR
    A[SQL Server 2025
On-premises] -->|Continuous Mirroring| B[Microsoft Fabric] B --> C[Power BI
Reports] B --> D[Data Engineering
Notebooks] B --> E[AI/ML
Workloads] A -->|Resource Governor| F[Resource Control
for Mirroring] style A fill:#2c3e50,stroke:#fff,color:#fff style B fill:#e94560,stroke:#fff,color:#fff style C fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style D fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50 style E fill:#f8f9fa,stroke:#e0e0e0,color:#2c3e50
Fabric Mirroring architecture from SQL Server 2025

12. MCP Server for SQL Server

SQL Server 2025 integrates a SQL MCP Server through Data API Builder, allowing AI agents (including Azure AI Foundry agents) to connect directly to the database via the MCP protocol. This is a significant step toward making SQL Server a secure "data source" for Agentic AI systems.

Conclusion

SQL Server 2025 isn't just an upgrade — it's a redefinition of the relational database's role in the AI era. With integrated vector search, native JSON, RegEx in T-SQL, and the ability to invoke AI models directly, SQL Server 2025 enables developers to build AI-powered applications without leaving the familiar ecosystem. Notably, the free 50GB Express Edition makes these features accessible to projects of all sizes.

References: