Bert Hubert
๐ค SpeakerAppearances Over Time
Podcast Appearances
Governments, when they send you documents, they love PDF. PDF has a lot going for them. You have PDF slash A, which is an archival form of PDF, which is actually nice because that's the sort of PDF variant where you can say, I'm sure I'll be able to read this PDF 50 years from now. So it only uses built-in fonts. It has no language. It doesn't do anything.
Governments, when they send you documents, they love PDF. PDF has a lot going for them. You have PDF slash A, which is an archival form of PDF, which is actually nice because that's the sort of PDF variant where you can say, I'm sure I'll be able to read this PDF 50 years from now. So it only uses built-in fonts. It has no language. It doesn't do anything.
So governments really love their PDF files. The thing is web users do not love their PDF files. Web users want to see web pages. They want to see HTML. They just want to click. And the thing is, when the Dutch government gets a document, it starts life as a Word document, a DocX document. which is easily processed. Then they convert it to XML, which is also easily processed.
So governments really love their PDF files. The thing is web users do not love their PDF files. Web users want to see web pages. They want to see HTML. They just want to click. And the thing is, when the Dutch government gets a document, it starts life as a Word document, a DocX document. which is easily processed. Then they convert it to XML, which is also easily processed.
And then finally they convert it to PDF, which is not so nice to process. So in order to make all this stuff work, I found out that there is a sort of official government archive of government documents. And there they also publish the XML. So in the course of this trajectory, I retrieved the document like four times.
And then finally they convert it to PDF, which is not so nice to process. So in order to make all this stuff work, I found out that there is a sort of official government archive of government documents. And there they also publish the XML. So in the course of this trajectory, I retrieved the document like four times.
And to save it, I have to retrieve it from some kind of official government archival site where there is XML. And that XML actually is glorious. So when someone speaks in Dutch parliament, they make a note at the exact timestamps when they speak. And this is so they can match it up with the video.
And to save it, I have to retrieve it from some kind of official government archival site where there is XML. And that XML actually is glorious. So when someone speaks in Dutch parliament, they make a note at the exact timestamps when they speak. And this is so they can match it up with the video.
But it also allowed me to make these statistics and say, who are our sort of Congress people that talk the most? or that talk least, or that have the longest sentences. And because they log it all so well, I could also, who are the fastest talkers of our Congress and who are the slowest talkers? And this is of course not very necessary, but it is a lot of fun.
But it also allowed me to make these statistics and say, who are our sort of Congress people that talk the most? or that talk least, or that have the longest sentences. And because they log it all so well, I could also, who are the fastest talkers of our Congress and who are the slowest talkers? And this is of course not very necessary, but it is a lot of fun.
But yeah, I never used to be a fan of XML, but it does actually get the job done in this case. It's actually not so bad.
But yeah, I never used to be a fan of XML, but it does actually get the job done in this case. It's actually not so bad.
Yeah, everything is better than the PDF. I mean, people, I understand that people sort of love it because it feels like it's a standard format and you have everything in there that you need. But to process it, it's nasty. I mean, for example, if you have a two column PDF file, it's actually not easy to figure out that it is a two column PDF file.
Yeah, everything is better than the PDF. I mean, people, I understand that people sort of love it because it feels like it's a standard format and you have everything in there that you need. But to process it, it's nasty. I mean, for example, if you have a two column PDF file, it's actually not easy to figure out that it is a two column PDF file.
Because if you look at the postscript, it's just a sentence and a bunch of spaces and then another sentence, which is the next column. So it's all not that great.
Because if you look at the postscript, it's just a sentence and a bunch of spaces and then another sentence, which is the next column. So it's all not that great.
Actually, no, we don't. And I try telling people that our government, our politics are also becoming wilder and less useful. But we haven't yet sunk to the level that we need a filibuster kind of thing. So for now, if you have a simple majority, you can actually get stuff done in Dutch government. But I worry where it's going.
Actually, no, we don't. And I try telling people that our government, our politics are also becoming wilder and less useful. But we haven't yet sunk to the level that we need a filibuster kind of thing. So for now, if you have a simple majority, you can actually get stuff done in Dutch government. But I worry where it's going.
Yeah, yeah, we do. And we have these people that actually, well, they will talk your ears off. Uh, and actually it's not always who you think it is. It is, it is sort of funny to, when you run these numbers, you find out that it's actually, uh, quite often that it's different reference from what you thought it would be.
Yeah, yeah, we do. And we have these people that actually, well, they will talk your ears off. Uh, and actually it's not always who you think it is. It is, it is sort of funny to, when you run these numbers, you find out that it's actually, uh, quite often that it's different reference from what you thought it would be.