编程知识 cdmana.com

It's 2021. Even when someone is doing big data, they ignore JSON and study the technology of transforming XML into XML with C

In the process of big data project development ,ETL(Extract-Transform-Load) Is essential . Even now JSON Very popular , Developers also have certain challenges to ancient systems , and XML The data source of the format is a classic existence, and the whole body is full of thick old money The smell of .

Because there is Newtonsoft.Json Such an excellent JSON The framework exists , It's easy for developers to understand JSON Format string deserialization . however XML Data format is not so convenient : although .NET It's built in to XML Support for serialization and deserialization , But it's not very convenient to dock external data .

Use XmlReader Reading data

from XML It's the most efficient way to extract the target data in , The most troublesome way is to use it directly  XmlReader :

<employee xmlns="urn:empl-hire">
<ID>12365</ID>
<hire-date>2003-01-08</hire-date>
<title>Accountant</title>
</employee>

Use the following code for the above hireDate.xml File read :

using (XmlReader reader = XmlReader.Create("hireDate.xml")) {
 
// Move to the hire-date element.
reader.MoveToContent();
reader.ReadToDescendant("hire-date");
 
// Return the hire-date as a DateTime object.
DateTime hireDate = reader.ReadElementContentAsDateTime();
Console.WriteLine("Six Month Review Date: {0}", hireDate.AddMonths(6));
}

Output :

Six Month Review Date: 7/8/2003 12:00:00 AM

Use XDocument Reading data

stay .NET Framework 3.5 In the post release period , Developers can use  XDocument  To generate and parse XML file , This is better than XmlReader It's much more convenient :

string str =
@"<?xml version=""1.0""?>
<!-- comment at the root level -->
<Root>
<Child>Content</Child>
</Root>";
XDocument doc = XDocument.Parse(str);
Console.WriteLine(doc.XPathSelectElement("//Child"));

Output :

<Child>Content</Child>

But hard coded XPath It's not easy to debug , And we need to pay attention to the problem of null reference all the time . stay XML The format is complex 、 It is not convenient to use when the project is relatively large .

A kind of handlebar XML Convert to XML Technology : XSLT

In computer science , Extensible stylesheet conversion language ( English :Extensible Stylesheet Language Transformations, abbreviation XSLT) It's a style conversion markup language , Can be XML The data file is converted to another XML Or other formats , Such as HTML Webpage , plaintext .XSLT The last T The letter means... In English “ transformation ”(transformation).

Simply speaking , Developers can use XSLT Technology to write a XML file , And use this file to put a XML Format to another XML . namely : In docking complex formats XML Data source , Developers can write a suffix with .xsl The file of , And use the file to convert the data source format to the format you need ( For example, it can be adapted to XML The format of deserialization ).

From a simple XML File start :

<?xml version="1.0" encoding="ISO-8859-1"?>
<catalog>
<cd>
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<country>USA</country>
<company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd>
.
.
.
</catalog>

If you open this file directly in the browser :

Suppose we only care about everything title Information , You can use the following cdcatalog.xsl file , This file can put cdcatalog.xml To XmlSerializer The format required :

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="/">
<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsl:for-each select="catalog/cd">
<string>
<xsl:value-of select="title"/>
</string>
</xsl:for-each>
</ArrayOfString>
</xsl:template>
</xsl:stylesheet>

In order to directly observe the conversion effect in the browser , You can choose to XSL Stylesheet links to XML file : towards XML file (”cdcatalog.xml”) add to XSL Style sheet reference is enough .

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="cdcatalog.xsl"?>
<catalog>
<cd>
<title>Empire Burlesque</title>
<artist>Bob Dylan</artist>
<country>USA</country>
<company>Columbia</company>
<price>10.90</price>
<year>1985</year>
</cd>
.
.
.
</catalog>

Refresh the browser , Open developer tools :

It can also be in : https://www.coderbusy.com/demos/2021/1531/cdcatalog.xml  Check out the online examples .

As can be seen from the above operation , debugging XLS The cost of documentation is very low , It's easy for developers to XLS File changes , And get the running results in a short time .

stay C# Use in XSLT technology

stay C# in , have access to  XslCompiledTransform  Conduct XSL transformation . The following code shows the conversion process :

XslCompiledTransform xsl = new XslCompiledTransform();
xsl.Load("cdcatalog.xsl");
var sb = new StringBuilder();
using (var sw = new StringWriter(sb))
{
  using (var xw = new XmlTextWriter(sw) { Formatting = Formatting.Indented })
  {
    xsl.Transform("cdcatalog.xml", xw);
  }
}
 
var xml = sb.ToString();
Console.WriteLine(xml);

The above code will produce the following output :

<ArrayOfString xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<string>Empire Burlesque</string>
<string>Hide your heart</string>
<string>Greatest Hits</string>
<string>Still got the blues</string>
<string>Eros</string>
.
.
.
</ArrayOfString>

Deserialization XML character string

transformation XML Is not the goal , You can get the data object directly . The above code completes the format conversion , And then you need to convert the good XML String deserialization :

var xmlSerializer = new XmlSerializer(typeof(List<string>));
using (var ms = new emoryStream(Encoding.UTF8.GetBytes(xml)))
{
    var list = (List<string>) xmlSerializer.Deserialize(ms);
    foreach (var item in list)
    {
        Console.WriteLine(item);
    }
}    

With the help of  XmlSerializer  Realized the deserialization function , This produces the following output :

Empire Burlesque
Hide your heart
Greatest Hits
Still got the blues
Eros
...

 

Summary and source code

The transformation and deserialization techniques described in this article have been verified in real production environments , Tens of millions of data processing is also effortless .

The demo code and data included in this article can be found in Gitee Found on the : https://gitee.com/coderbusy/demo/tree/master/hello-xslt/HelloXslt .

版权声明
本文为[Soar, Yi]所创,转载请带上原文链接,感谢
https://cdmana.com/2021/05/20210524222253101F.html

Scroll to Top